If you’re launching an irreversible CEV, it’s not very safe to rely on your intuition that other people’s expressed desires are “probably lying, trolling, joking” and so wouldn’t affect the CEV outcome.
How can we even start defining CEV without brain scanning technology able to do much more than answering the original question?
It would seem that we can define the algorithm which can be used to manipulate and process a given input of loosely defined inconsistent preferences. This would seem to be a necessary thing to do before any actual brain scanning becomes involved.
Well part of my point is that indeed we can’t even define CEV today, let alone solve it, and so a lot of conclusions/propositions people put forward about what CEV’s output would be like are completely unsupported by evidence; they are mere wishful thinking.
More on-topic: today you have humans as black boxes, but you can still measure what they value, by 1) offering them concrete tradeoffs and measuring behavior and 2) asking them.
Tomorrow, suppose your new brain scanning tech allows you to perfectly understand how brains work. You can now explain how these values are implemented. But they are the same values you observed earlier. So the only new knowledge relevant to CEV would be that you might derive how people would behave in a hypothetical situation, without actually putting them in that situation (because that might be unethical or expensive).
Now, suppose someone expresses a value that you think they are merely “lying, trolling or joking” about. In all of their behavior throughout their lives, and in their own words today, they honestly have this value. But your brain scanner shows that in some hypothetical situation, they would behave consistently with valuing this value less.
By construction, since you couldn’t derive this knowledge from their life histories (already known without a brain scanner), these are situations they have (almost) never been in. (And therefore they aren’t likely to be in them in the future, either.)
So why do you effectively say that for purposes of CEV, their behavior in such counterfactual situations is “their true values”, while their behavior in the real, common situations throughout their lives isn’t? Yes, humans might be placed in totally novel situations which can cause them to reconsider their values; because humans have conflicting values, and non-explicit values (but rather behaviors responding to situations), and no truly top-level goals (so that all values may change). But you could just as easily say that there are probably situations in which you could be placed so that you would come to value their values more.
Your approach places people in the unfortunate position where they might live their whole lives believing in a value, and fighting for it, and then you (or the CEV AI) come up to them and says: I’m going to destroy everything you’ve valued so far. Not because of objective ethics or decree of God or majority vote or anything objective and external. But because they themselves actually “really” prefer completely different values even though on the conscious level, no matter how long they might think and talk and read about it, they would never reach that conclusion.
In all of their behavior throughout their lives, and in their own words today, they honestly have this value
This is the conditional that I believe is false when I say “they are probably lying, trolling, joking”. I believe that when you use the brain scanner on those nihilists, and ask them whether they would prefer the world where everyone is dead to any other possible world, and they say yes, the brain scanner would show they are lying, trolling or joking.
How would you respond if you were subject to such a brain scan and then informed that deep inside you actually are a nihilist who prefers the complete destruction of all life?
And suppose we develop such brain scanning technology and scanning someone else who claims to want the destruction of all life and it says “yep, he does” how would you respond?
That you don’t expect it to happen shouldn’t by itself be a reason not to consider it. I’m asking because it seems you are avoiding the hard questions by more or less saying you don’t think they will happen. And there are many more conflicting value sets which are less extreme (and apparently more common) than this one.
Errr. This is a question of simple fact, which is either true or false. I believe it’s true, and build the plans accordingly. We can certainly think about contingency plans, of what to do if the belief turns out to be false, but so far no one agreed that the plan is good even in case the belief is true.
You’ve lost me. Can you restate the question of simple fact to which you refer here, which you believe is true? Can you restate the plan that you consider good if that question is true?
I believe there exist (extrapolated) wishes universal for humans (meaning, true for literally everyone). Among these wishes, I think there is the wish for humans to continue existing. I would like for AI to fulfil this wish (and other universal wishes if there are any), while letting people decide everything else for themselves.
To answer your question: sure, if I assume (as you seem to) that the extrapolation process is such that I would in fact endorse the results, and I also assume that the extrapolation process is such that if it takes as input all humans it will produce at least one desire that is endorsed by all humans (even if they themselves don’t know it in their current form), then I’d agree that’s a good plan, if I further assume that it doesn’t have any negative side-effects.
But the assumptions strike me as implausible, and that matters.
I mean, if I assume that everyone being thrown into a sufficiently properly designed blender and turned into stew is a process I would endorse, and I also assume that the blending process has no negative side-effects, then I’d agree that that’s a good plan, too. I just don’t think any such blender is ever going to exist.
Ok, but do you grant that running a FAI with “unanimous CEV” is at least (1) safe, and (2) uncontroversial? That the worst problem with it is that it may just stand there doing nothing—if I’m wrong about my hypothesis?
I don’t know how to answer that question. Again, it seems that you’re trying to get an answer given a whole bunch of assumptions, but that you resist the effort to make those assumptions clear as part of the answer.
It is not clear to me that there exists such a thing as a “unanimous CEV” at all, even in the hypothetical sense of something we might be able to articulate some day with the right tools.
If I nevertheless assume that a unanimous CEV exists in that hypothetical sense, it is not clear to me that only one exists; presumably modifications to the CEV-extraction algorithm would result in different CEVs from the same input minds, and I don’t see any principled grounds for choosing among that cohort of algorithms that don’t in effect involve selecting a desired output first. (In which case CEV extraction is a complete red herring, since the output was a “bottom line” written in advance of CEV’s extraction, and we should be asking how that output was actually arrived at and whether we endorse that process. )
If I nevertheless assume that a single CEV-extraction algorithm is superior to all the others, and further assume that we select that algorithm via some process I cannot currently imagine and run it, and that we then run a superhuman environment-optimizer with its output as a target, it is not clear to me that I would endorse that state change as an individual. So, no, I don’t agree that running it is uncontroversial. (Although everyone might agree afterwards that it was a good idea.)
If the state change nevertheless gets implemented, I agree (given all of those assumptions) that the resulting state-change improves the world by the standards of all humanity. “Safe” is an OK word for that, I guess, though it’s not the usual meaning of “safe.”
I don’t agree that the worst that happens, if those assumptions turn out to be wrong, is that it stands there and does nothing. The worst that happens is that the superhuman environment-optimizer runs with a target that makes the world worse by the standards of all humanity.
(Yes, I understand that the CEV-extraction algorithm is supposed to prevent that, and I’ve agreed that if I assume that’s true, then this doesn’t happen. But now you’re asking me to consider what happens if the “hypothesis” is false, so I am no longer just assuming that’s true. You’re putting a lot of faith in a mysterious extraction algorithm, and it is not clear to me that a non-mysterious algorithm that satisfies that faith is likely, or that the process of coming up with one won’t come up with a different algorithm that antisatisfies that faith instead.)
What I’m trying to do is find some way to fix the goalposts. Find a set of conditions on CEV that would satisfy. Whether such CEV actually exists and how to build it are questions for later. Lets just pile up constraints until a sufficient set is reached. So, lets assume that:
“Unanimous” CEV exists
And is unique
And is definable via some easy, obviously correct, and unique process, to be discovered in the future,
And it basically does what I want it to do (fulfil universal wishes of people, minimize interference otherwise),
would you say that running it is uncontroversial? If not, what other conditions are required?
No, I wouldn’t expect running it to be uncontroversial, but I would endorse running it.
I can’t imagine any world-changing event that would be uncontroversial, if I assume that the normal mechanisms for generating controversy aren’t manipulated (in which case anything might be uncontroversial).
I’m not sure. But it seems a useful property to have for an AI being developed. It might allow centralizing the development. Or something.
Ok, you’re right in that a complete lack of controversy is impossible, because there are always trolls, cranks, conspiracy theorists, etc. But is it possible to reach a consensus among all sufficiently well-informed sufficiently intelligent people? Where “sufficiently” is not a too high threshold?
There probably exists (hypothetically) some plan such that it wouldn’t seem unreasonable to me to declare anyone who doesn’t endorse that plan either insufficiently well-informed or insufficiently intelligent.
In fact, there probably exist several such plans, many of which would have results I would subsequently regret, and some of which do not.
I think seeking and refining such plans would be a worthy goal. For one thing, it would make LW discussions more constructive. Currently, as far as I can tell, CEV is very broadly defined, and its critics usually point at some feature and cast (legitimate) doubt on it. Very soon, CEV is apparently full of holes and one may wonder why is it not thrown away already. But they may be not real holes, just places where we do not know enough yet. If these points are identified and stated in a form of questions of fact, which can be answered by future research, then a global plan, in the form of a decision tree, could be made and reasoned about. That would be a definite progress, I think.
I believe there exist (extrapolated) wishes universal for humans (meaning, true for literally everyone). Among these wishes, I think there is the wish for humans to continue existing.
VHEMT supports human extinction primarily because, in the group’s view, it would prevent environmental degradation. The group states that a decrease in the human population would prevent a significant amount of man-made human suffering.
Obviously, human extinction is not their terminal value.
Or at least, not officially. I have known at least one person who professed to desire that the human race go extinct because he thought the universe as a whole would simply be better if humans did not exist. It’s possible that he was stating such an extreme position for shock value (he did have a tendency to display some fairly pronounced antisocial tendencies,) and that he had other values that conflicted with this position on some level. But considering the diversity of viewpoints and values I’ve observed people to hold, I would bet quite heavily against nobody in the world actually desiring the end of human existence.
They are probably lying, trolling, joking, or psychos (=do not have enough extrapolated intelligence and knowledge).
If you’re launching an irreversible CEV, it’s not very safe to rely on your intuition that other people’s expressed desires are “probably lying, trolling, joking” and so wouldn’t affect the CEV outcome.
I only proposed a hypothesis, which will become testable earlier than the time when CEV could be implemented.
How do you propose to test it without actually running a CEV calculation?
How can we even start defining CEV without brain scanning technology able to do much more than answering the original question?
It would seem that we can define the algorithm which can be used to manipulate and process a given input of loosely defined inconsistent preferences. This would seem to be a necessary thing to do before any actual brain scanning becomes involved.
Well part of my point is that indeed we can’t even define CEV today, let alone solve it, and so a lot of conclusions/propositions people put forward about what CEV’s output would be like are completely unsupported by evidence; they are mere wishful thinking.
More on-topic: today you have humans as black boxes, but you can still measure what they value, by 1) offering them concrete tradeoffs and measuring behavior and 2) asking them.
Tomorrow, suppose your new brain scanning tech allows you to perfectly understand how brains work. You can now explain how these values are implemented. But they are the same values you observed earlier. So the only new knowledge relevant to CEV would be that you might derive how people would behave in a hypothetical situation, without actually putting them in that situation (because that might be unethical or expensive).
Now, suppose someone expresses a value that you think they are merely “lying, trolling or joking” about. In all of their behavior throughout their lives, and in their own words today, they honestly have this value. But your brain scanner shows that in some hypothetical situation, they would behave consistently with valuing this value less.
By construction, since you couldn’t derive this knowledge from their life histories (already known without a brain scanner), these are situations they have (almost) never been in. (And therefore they aren’t likely to be in them in the future, either.)
So why do you effectively say that for purposes of CEV, their behavior in such counterfactual situations is “their true values”, while their behavior in the real, common situations throughout their lives isn’t? Yes, humans might be placed in totally novel situations which can cause them to reconsider their values; because humans have conflicting values, and non-explicit values (but rather behaviors responding to situations), and no truly top-level goals (so that all values may change). But you could just as easily say that there are probably situations in which you could be placed so that you would come to value their values more.
Your approach places people in the unfortunate position where they might live their whole lives believing in a value, and fighting for it, and then you (or the CEV AI) come up to them and says: I’m going to destroy everything you’ve valued so far. Not because of objective ethics or decree of God or majority vote or anything objective and external. But because they themselves actually “really” prefer completely different values even though on the conscious level, no matter how long they might think and talk and read about it, they would never reach that conclusion.
This is the conditional that I believe is false when I say “they are probably lying, trolling, joking”. I believe that when you use the brain scanner on those nihilists, and ask them whether they would prefer the world where everyone is dead to any other possible world, and they say yes, the brain scanner would show they are lying, trolling or joking.
OK. That’s possible. But why do you believe that, despite their large numbers and lifelong avowal of those beliefs?
How would you respond if you were subject to such a brain scan and then informed that deep inside you actually are a nihilist who prefers the complete destruction of all life?
I’d think someone’s playing a practical joke on me.
And suppose we develop such brain scanning technology and scanning someone else who claims to want the destruction of all life and it says “yep, he does” how would you respond?
Dunno… propose to kill them quickly and painlessly, maybe? But why do you ask? As I said, I don’t expect this to happen.
That you don’t expect it to happen shouldn’t by itself be a reason not to consider it. I’m asking because it seems you are avoiding the hard questions by more or less saying you don’t think they will happen. And there are many more conflicting value sets which are less extreme (and apparently more common) than this one.
Errr. This is a question of simple fact, which is either true or false. I believe it’s true, and build the plans accordingly. We can certainly think about contingency plans, of what to do if the belief turns out to be false, but so far no one agreed that the plan is good even in case the belief is true.
You’ve lost me. Can you restate the question of simple fact to which you refer here, which you believe is true? Can you restate the plan that you consider good if that question is true?
I believe there exist (extrapolated) wishes universal for humans (meaning, true for literally everyone). Among these wishes, I think there is the wish for humans to continue existing. I would like for AI to fulfil this wish (and other universal wishes if there are any), while letting people decide everything else for themselves.
OK, cool.
To answer your question: sure, if I assume (as you seem to) that the extrapolation process is such that I would in fact endorse the results, and I also assume that the extrapolation process is such that if it takes as input all humans it will produce at least one desire that is endorsed by all humans (even if they themselves don’t know it in their current form), then I’d agree that’s a good plan, if I further assume that it doesn’t have any negative side-effects.
But the assumptions strike me as implausible, and that matters.
I mean, if I assume that everyone being thrown into a sufficiently properly designed blender and turned into stew is a process I would endorse, and I also assume that the blending process has no negative side-effects, then I’d agree that that’s a good plan, too. I just don’t think any such blender is ever going to exist.
Ok, but do you grant that running a FAI with “unanimous CEV” is at least (1) safe, and (2) uncontroversial? That the worst problem with it is that it may just stand there doing nothing—if I’m wrong about my hypothesis?
I don’t know how to answer that question. Again, it seems that you’re trying to get an answer given a whole bunch of assumptions, but that you resist the effort to make those assumptions clear as part of the answer.
It is not clear to me that there exists such a thing as a “unanimous CEV” at all, even in the hypothetical sense of something we might be able to articulate some day with the right tools.
If I nevertheless assume that a unanimous CEV exists in that hypothetical sense, it is not clear to me that only one exists; presumably modifications to the CEV-extraction algorithm would result in different CEVs from the same input minds, and I don’t see any principled grounds for choosing among that cohort of algorithms that don’t in effect involve selecting a desired output first. (In which case CEV extraction is a complete red herring, since the output was a “bottom line” written in advance of CEV’s extraction, and we should be asking how that output was actually arrived at and whether we endorse that process. )
If I nevertheless assume that a single CEV-extraction algorithm is superior to all the others, and further assume that we select that algorithm via some process I cannot currently imagine and run it, and that we then run a superhuman environment-optimizer with its output as a target, it is not clear to me that I would endorse that state change as an individual. So, no, I don’t agree that running it is uncontroversial. (Although everyone might agree afterwards that it was a good idea.)
If the state change nevertheless gets implemented, I agree (given all of those assumptions) that the resulting state-change improves the world by the standards of all humanity. “Safe” is an OK word for that, I guess, though it’s not the usual meaning of “safe.”
I don’t agree that the worst that happens, if those assumptions turn out to be wrong, is that it stands there and does nothing. The worst that happens is that the superhuman environment-optimizer runs with a target that makes the world worse by the standards of all humanity.
(Yes, I understand that the CEV-extraction algorithm is supposed to prevent that, and I’ve agreed that if I assume that’s true, then this doesn’t happen. But now you’re asking me to consider what happens if the “hypothesis” is false, so I am no longer just assuming that’s true. You’re putting a lot of faith in a mysterious extraction algorithm, and it is not clear to me that a non-mysterious algorithm that satisfies that faith is likely, or that the process of coming up with one won’t come up with a different algorithm that antisatisfies that faith instead.)
What I’m trying to do is find some way to fix the goalposts. Find a set of conditions on CEV that would satisfy. Whether such CEV actually exists and how to build it are questions for later. Lets just pile up constraints until a sufficient set is reached. So, lets assume that:
“Unanimous” CEV exists
And is unique
And is definable via some easy, obviously correct, and unique process, to be discovered in the future,
And it basically does what I want it to do (fulfil universal wishes of people, minimize interference otherwise),
would you say that running it is uncontroversial? If not, what other conditions are required?
No, I wouldn’t expect running it to be uncontroversial, but I would endorse running it.
I can’t imagine any world-changing event that would be uncontroversial, if I assume that the normal mechanisms for generating controversy aren’t manipulated (in which case anything might be uncontroversial).
Why is it important that it be uncontroversial?
I’m not sure. But it seems a useful property to have for an AI being developed. It might allow centralizing the development. Or something.
Ok, you’re right in that a complete lack of controversy is impossible, because there are always trolls, cranks, conspiracy theorists, etc. But is it possible to reach a consensus among all sufficiently well-informed sufficiently intelligent people? Where “sufficiently” is not a too high threshold?
There probably exists (hypothetically) some plan such that it wouldn’t seem unreasonable to me to declare anyone who doesn’t endorse that plan either insufficiently well-informed or insufficiently intelligent.
In fact, there probably exist several such plans, many of which would have results I would subsequently regret, and some of which do not.
I think seeking and refining such plans would be a worthy goal. For one thing, it would make LW discussions more constructive. Currently, as far as I can tell, CEV is very broadly defined, and its critics usually point at some feature and cast (legitimate) doubt on it. Very soon, CEV is apparently full of holes and one may wonder why is it not thrown away already. But they may be not real holes, just places where we do not know enough yet. If these points are identified and stated in a form of questions of fact, which can be answered by future research, then a global plan, in the form of a decision tree, could be made and reasoned about. That would be a definite progress, I think.
Agreed that an actual concrete plan would be a valuable thing, for the reasons you list among others.
Does the existence of the Voluntary Human Extinction Movement affect your belief in this proposition?
Obviously, human extinction is not their terminal value.
Or at least, not officially. I have known at least one person who professed to desire that the human race go extinct because he thought the universe as a whole would simply be better if humans did not exist. It’s possible that he was stating such an extreme position for shock value (he did have a tendency to display some fairly pronounced antisocial tendencies,) and that he had other values that conflicted with this position on some level. But considering the diversity of viewpoints and values I’ve observed people to hold, I would bet quite heavily against nobody in the world actually desiring the end of human existence.