Coherent Extrapolated Volition offers no definition, but proposes one possible candidate “whom”—in the form of its extrapolated hodge-podge of all humans.
I have an issue with CEV, which is that I don’t think we should extrapolate. We should give the current crop of humans what they want, not what we imagine they might want if they were extrapolated into some finished version of themselves. In the example where Fred wants to kill Steve, CEV says the FAI shouldn’t help because Fred aspires to give up hatred someday and the extrapolated Fred wouldn’t want to kill Steve. On the contrary, I say the FAI shouldn’t help because Steve wants to live more than Fred wants to kill him.
For example, in the original post, if our AI cares about an extrapolated version of speaker A, then it’s possible that that extrapolated A will want different things from the actual present A, so the actual A would be wise to withhold any assistance until he clearly understood the extrapolation process.
On the contrary, I say the FAI shouldn’t help because Steve wants to live more than Fred wants to kill him.
Doesn’t this fall victim to utility monsters, though? If there’s some actor who wants to kill you more than you want not to die, then the FAI would be obliged to kill you. That’s a classic utility monster: an entity that wants harder than anyone else.
One solution is to renorm everyone’s wants, such that the desires of any sentient being don’t count any more than any other. But this leads directly to Parfit’s Repugnant Conclusion¹, or at least to some approximation thereof: maximizing the number of sentient beings, even at the expense of their individual wants or pleasure.
¹ Parfit’s repugnant conclusion, Level 7 Necromancer spell. Material components include the severed head of a utility monster, or a bag of holding filled with orgasmium, which is consumed during casting. Fills a volume of space with maximally dense, minimally happy (but barely nonsuicidal) sentient minds, encoded onto any available material substrate.
I agree that you have to renorm everyone’s wants for this to work. I also agree that if you can construct broken minds for the purpose of manipulating the FAI, we need provisions to guard against that. My preferred alternative at the moment follows:
Before people become able to construct broken minds, the FAI cares about everything that’s genetically human.
After we find the first genetically human mind deliberately broken for the purpose of manipulating the FAI, we guess when the FAI started to be influenced by that, and retroactive to just before that time we introduce a new policy: new individuals start out with a weight of 0, and can receive weight transferred from their parents, so the total weight is conserved. I don’t want an economy of weight-transfer to arise, so it would be a one-way irreversible transfer.
This might lead to a few people running around with a weight of 0 because their parents never made the transfer. This would be suboptimal, but it would not have horrible conclusions because the AI would care for the parents who probably care for the new child, so the AI would in effect care some for the new child.
Death of the parents doesn’t break this. Caring about the preference of dead people is not a special case.
I encourage people to reply to this post with bugs in this alternative or with other plausible alternatives.
I agree completely that the extrapolation process as envisioned in CEV leads to the system doing all manner of things that the original people would reject.
It is also true that maturation often leads to adult humans doing all manner of things that their immature selves would reject. And, sure, it’s possible to adopt the “Peter Pan” stance of “I don’t wanna grow up!” in response to that, though it’s hard to maintain in the face of social expectations and biological imperatives.
It is not, however, clear to me that a seven-year-old would be wise to reject puberty, or that we would, if offered a pill that ensured that we would never come to prefer anything different from what we prefer right now, be wise to collectively take it.
That extrapolation leads to something different isn’t clearly a reason to reject it.
It is not, however, clear to me that a seven-year-old would be wise to reject puberty
The difference between a seven year old and an adult is a transition into a well understood state that many have been in. The most powerful people in society are already in that state. In contrast, the transition of the AI’s extrapolation is going into a completely new state that nobody knows anything about, except possibly the AI. The analogy isn’t valid.
That extrapolation leads to something different isn’t clearly a reason to reject it.
It’s a wildcard with unknown and unknowable consequences. That’s not a good thing to have in a Friendly AI. The burden of proof should be on the people who want to include it. As I mentioned earlier, it’s not the best solution to the Fred-wants-to-murder-Steve problem, since it’s more reliable to look at Steve’s present desire to live than to hope that extrapolated-Fred doesn’t want to murder. So it isn’t needed to solve that problem. What problem does it solve?
I agree with you that puberty is a familiar, well-understood transition, whereas extrapolation is not. It’s not clear to me that reasoning from familiar cases to novel ones by analogy is invalid, but I certainly agree that reasoning by analogy doesn’t prove much of anything, and you’re entirely justified in being unconvinced by it.
I agree with you that anyone who wants to flip the switch on what I consider a FOOMing CEV-implementing AI has an enormous burden of proof to shoulder before they get my permission to do so. (Not that I expect they will care very much.)
I agree with you that if we simply want to prevent the AI from killing people, we can cause it to implement people’s desire to live; we don’t need to extrapolate Fred’s presumed eventual lack of desire-to-kill to achieve that.
My one-sentence summary of the problem CEV is intended to solve (I do not assert that it does so) is “how do we define the target condition for a superhuman environment-optimizing system in such a way that we can be confident that it won’t do the wrong thing?”
That is expanded on at great length in the Metaethics and Fun Theory sequences, if you’re interested. Those aren’t the clearest conceivable presentation, but I doubt I will do better in a comment and am not highly motivated to try.
My one-sentence summary of the problem CEV is intended to solve (I do not assert
that it does so) is “how do we define the target condition for a superhuman
environment-optimizing system in such a way that we can be confident that it won’t do
the wrong thing?”
My question was meant to be “What problem does extrapolation solve?”, not “What problem is CEV intended to solve?” To answer the former question, you’d need some example that can be solved with extrapolation that can’t easily be solved without it. I can’t presently see a reason the example should be much more complicated than the Fred-wants-to-kill-Steve example we were talking about earlier.
That is expanded on at great length in the Metaethics and Fun Theory sequences, if
you’re interested.
I might read that eventually, but not for the purpose of getting an answer to this question. I have no reason to believe the problem solved by extrapolation is so complex that one needs to read a long exposition to understand the problem. Understanding why extrapolation solves the problem might take some work, but understanding what the problem is should not. If there’s no short description of a problem that requires extrapolation to solve it, it seems likely to me that extrapolation does not solve a problem.
For example, integral calculus is required to solve the problem “What is the area under this parabola?”, given enough parameters to uniquely determine the parabola. Are you seriously saying that extrapolation is necessary but its role is more obscure than that of integral calculus?
Are you seriously saying that extrapolation is necessary but its role is more obscure than that of integral calculus?
What I said was that the putative role of extrapolation is avoiding optimizing for the wrong thing.
That’s not noticeably more complicated a sentence than “the purpose of calculus is to calculate the area under a parabola”, so I mostly think your question is rhetorically misleading.
Anyway, as I explicitly said, I’m not asserting that extrapolation solves any problem at all. I was answering (EDIT: what I understood to be) your question about what problem it’s meant to solve, and providing some links to further reading if you’re interested, which it sounds like you aren’t, which is fine.
Ah, I see. I was hoping to find an example, about as concrete as the Fred-wants-to-kill-Steve example, that someone believes actually motivates extrapolation. A use-case, as it were.
You gave the general idea behind it. In retrospect, that was a reasonable interpretation of my question.
I’m not asserting that extrapolation solves any problem at all.
Okay, so you don’t have a use case. No problem, I don’t either. Does anybody else?
I realize you haven’t been online for a few months, but yes, I do.
Humanity’s desires are not currently consistent. An FAI couldn’t satisfy them all because some of them contradict each other, like Fred’s and Steve’s in your example. There may not even be a way of averaging them out fairly or meaningfully. Either Steve lives or dies: there’s no average or middle ground and Fred is just out of luck.
However, it might be the case that human beings are similar enough that if you extrapolate everything all humans want, you get something consistent. Extrapolation is a tool to resolve inconsistencies and please both Fred and Steve.
I have an issue with CEV, which is that I don’t think we should extrapolate.
Amen brother!
It would be good if Eliezer (or someone who understands his thinking) could explain just why it is so important to extrapolate—rather than, for example, using current volition of mankind (construed as a potentially moving target). I worry that extrapolation is proposed simply because Eliezer doesn’t much care for the current volition of mankind and hopes that the extrapolated volition will be more to his taste. Of course, another explanation is that it has something to do with the distaste for discounting.
Yes, that ‘poetry’ explains what extrapolation is, but not why we need to risk it. To my mind, this is the most dangerous aspect of the whole FAI enterprise. Yet we don’t have anything approaching an analysis of a requirements document—instead we get a poetic description of what Eliezer wants, a clarification of what the poetry means, but no explanation of why we should want that. It is presumed to be obvious that extrapolating can only improve things. Well, lets look more closely.
… if we knew more, …
An AI is going to tell us what we would want, if only we knew more. Apparently, there is an assumption here that the AI knows things we don’t. Personally, I worry a bit that an AI will come to believe things that are not true. In fact, I worry about it most when the AI claims to know something that mankind does not know—something dealing with human values. Why do I worry about that? Something someone wrote somewhere presumably. But maybe that is not the kind of superior AI ‘knowledge’ that Eliezer is talking about here.
Knew more: Fred may believe that box A contains the diamond, and say, “I want box A.” Actually, box B contains the diamond, and if Fred knew this fact, he would predictably say, “I want box B.”
And instead of extrapolating, why not just inform Fred where the diamond is? At this point, the explanation becomes bizarre.
If Fred would adamantly refuse to even consider the possibility that box B contains a diamond, while also adamantly refusing to discuss what should happen in the event that he is wrong in this sort of case, and yet Fred would still be indignant and bewildered on finding that box A is empty, Fred’s volition on this problem is muddled.
Am I alone in preferring, in this situation, that the AI not diagnose a ‘muddle’, and instead give Fred box A after offering him the relevant knowledge?
Thought faster: Suppose that your current self wants to use an elaborate system of ropes and sticks to obtain a tasty banana, but if you spent an extra week thinking about the problem, you would predictably see, and prefer, a simple and elegant way to get the banana using only three ropes and a teddy bear.
Again, if the faster thinking allows the AI to serve as an oracle, making suggestions that even our limited minds can appreciate once we hear them, then why should we take the risk of promoting the AI from oracle to king? The AI should tell us things rather than speaking for us.
Were more the people we wished we were: Any given human is inconsistent under reflection. We all have parts of ourselves that we would change if we had the choice, whether minor or major.
When we have a contradiction between a moral intuition and a maxim codifying our system of moral standards there are two ways we can go—we can revise the intuition or we can revise the maxim. It makes me nervous having an AI make the decisions leading to ‘reflective equilibrium’ rather than making those decisions myself. Instead of an extrapolation, I would prefer a dialog leading me to my own choice of equilibrium rather than having a machine pick one for me. Again, my slogan is “Speak to us, don’t speak for us.”
Where our wishes cohere rather than interfere
I’m not sure what to make of this one. Is there a claim here that extrapolation automatically leads to coherence? If so, could we have an argument justifying that claim? Or, is the point that the extrapolation specification has enough ‘free play’ to allow the AI to guide the extrapolation to coherence? Coherence is certainly an important issue. A desideratum? Certainly. A requirement? Maybe. But there are other ways of achieving accommodation without trying to create an unnatural coherence in our diverse species.
These are topics that really need to be discussed in a format other than poetry.
An AI is going to tell us what we would want, if only we knew more. Apparently, there is an assumption here that the AI knows things we don’t. Personally, I worry a bit that an AI will come to believe things that are not true. In fact, I worry about it most when the AI claims to know something that mankind does not know—something dealing with human values. Why do I worry about that? Something someone wrote somewhere presumably. But maybe that is not the kind of superior AI ‘knowledge’ that Eliezer is talking about here.
Rebuttal: Most people in the world believe in a religion that is wrong. (This conclusion holds regardless of which, if any, world religion happens to be true.) Would we want an A.I. that enforces the laws of a false religion because people want the laws of their religion enforced? (Assume that people would agree that the AI shouldn’t enforce the laws of false religions.)
If Fred would adamantly refuse to even consider the possibility that box B contains a diamond, while also adamantly refusing to discuss what should happen in the event that he is wrong in this sort of case, and yet Fred would still be indignant and bewildered on finding that box A is empty, Fred’s volition on this problem is muddled.
Am I alone in preferring, in this situation, that the AI not diagnose a ‘muddle’, and instead give Fred box A after offering him the relevant knowledge?
What if box A actually contains a bomb that explodes when Fred opens it? Should the AI still give Fred the box?
For example, here’s me in 2008:
“Friendly AI”? What’s one of those then? Friendly—to whom?
Another example from 2008 me:
It seems that we now have hundreds of posts on O.B. discussing “Friendly AI”—and not one seems to explain what the term means. Are we supposed to refer back to earlier writings? Friendly—to whom? What does the term “Friendly” actually mean, if used in a technical context?
I was referred to Coherent Extrapolated Volition and KnowabilityOfFAI.
Coherent Extrapolated Volition offers no definition, but proposes one possible candidate “whom”—in the form of its extrapolated hodge-podge of all humans.
I agree, that’s the “whom” in CEV.
I have an issue with CEV, which is that I don’t think we should extrapolate. We should give the current crop of humans what they want, not what we imagine they might want if they were extrapolated into some finished version of themselves. In the example where Fred wants to kill Steve, CEV says the FAI shouldn’t help because Fred aspires to give up hatred someday and the extrapolated Fred wouldn’t want to kill Steve. On the contrary, I say the FAI shouldn’t help because Steve wants to live more than Fred wants to kill him.
For example, in the original post, if our AI cares about an extrapolated version of speaker A, then it’s possible that that extrapolated A will want different things from the actual present A, so the actual A would be wise to withhold any assistance until he clearly understood the extrapolation process.
Doesn’t this fall victim to utility monsters, though? If there’s some actor who wants to kill you more than you want not to die, then the FAI would be obliged to kill you. That’s a classic utility monster: an entity that wants harder than anyone else.
One solution is to renorm everyone’s wants, such that the desires of any sentient being don’t count any more than any other. But this leads directly to Parfit’s Repugnant Conclusion¹, or at least to some approximation thereof: maximizing the number of sentient beings, even at the expense of their individual wants or pleasure.
¹ Parfit’s repugnant conclusion, Level 7 Necromancer spell. Material components include the severed head of a utility monster, or a bag of holding filled with orgasmium, which is consumed during casting. Fills a volume of space with maximally dense, minimally happy (but barely nonsuicidal) sentient minds, encoded onto any available material substrate.
I agree that you have to renorm everyone’s wants for this to work. I also agree that if you can construct broken minds for the purpose of manipulating the FAI, we need provisions to guard against that. My preferred alternative at the moment follows:
Before people become able to construct broken minds, the FAI cares about everything that’s genetically human.
After we find the first genetically human mind deliberately broken for the purpose of manipulating the FAI, we guess when the FAI started to be influenced by that, and retroactive to just before that time we introduce a new policy: new individuals start out with a weight of 0, and can receive weight transferred from their parents, so the total weight is conserved. I don’t want an economy of weight-transfer to arise, so it would be a one-way irreversible transfer.
This might lead to a few people running around with a weight of 0 because their parents never made the transfer. This would be suboptimal, but it would not have horrible conclusions because the AI would care for the parents who probably care for the new child, so the AI would in effect care some for the new child.
Death of the parents doesn’t break this. Caring about the preference of dead people is not a special case.
I encourage people to reply to this post with bugs in this alternative or with other plausible alternatives.
I agree completely that the extrapolation process as envisioned in CEV leads to the system doing all manner of things that the original people would reject.
It is also true that maturation often leads to adult humans doing all manner of things that their immature selves would reject. And, sure, it’s possible to adopt the “Peter Pan” stance of “I don’t wanna grow up!” in response to that, though it’s hard to maintain in the face of social expectations and biological imperatives.
It is not, however, clear to me that a seven-year-old would be wise to reject puberty, or that we would, if offered a pill that ensured that we would never come to prefer anything different from what we prefer right now, be wise to collectively take it.
That extrapolation leads to something different isn’t clearly a reason to reject it.
The difference between a seven year old and an adult is a transition into a well understood state that many have been in. The most powerful people in society are already in that state. In contrast, the transition of the AI’s extrapolation is going into a completely new state that nobody knows anything about, except possibly the AI. The analogy isn’t valid.
It’s a wildcard with unknown and unknowable consequences. That’s not a good thing to have in a Friendly AI. The burden of proof should be on the people who want to include it. As I mentioned earlier, it’s not the best solution to the Fred-wants-to-murder-Steve problem, since it’s more reliable to look at Steve’s present desire to live than to hope that extrapolated-Fred doesn’t want to murder. So it isn’t needed to solve that problem. What problem does it solve?
I agree with you that puberty is a familiar, well-understood transition, whereas extrapolation is not. It’s not clear to me that reasoning from familiar cases to novel ones by analogy is invalid, but I certainly agree that reasoning by analogy doesn’t prove much of anything, and you’re entirely justified in being unconvinced by it.
I agree with you that anyone who wants to flip the switch on what I consider a FOOMing CEV-implementing AI has an enormous burden of proof to shoulder before they get my permission to do so. (Not that I expect they will care very much.)
I agree with you that if we simply want to prevent the AI from killing people, we can cause it to implement people’s desire to live; we don’t need to extrapolate Fred’s presumed eventual lack of desire-to-kill to achieve that.
My one-sentence summary of the problem CEV is intended to solve (I do not assert that it does so) is “how do we define the target condition for a superhuman environment-optimizing system in such a way that we can be confident that it won’t do the wrong thing?”
That is expanded on at great length in the Metaethics and Fun Theory sequences, if you’re interested. Those aren’t the clearest conceivable presentation, but I doubt I will do better in a comment and am not highly motivated to try.
My question was meant to be “What problem does extrapolation solve?”, not “What problem is CEV intended to solve?” To answer the former question, you’d need some example that can be solved with extrapolation that can’t easily be solved without it. I can’t presently see a reason the example should be much more complicated than the Fred-wants-to-kill-Steve example we were talking about earlier.
I might read that eventually, but not for the purpose of getting an answer to this question. I have no reason to believe the problem solved by extrapolation is so complex that one needs to read a long exposition to understand the problem. Understanding why extrapolation solves the problem might take some work, but understanding what the problem is should not. If there’s no short description of a problem that requires extrapolation to solve it, it seems likely to me that extrapolation does not solve a problem.
For example, integral calculus is required to solve the problem “What is the area under this parabola?”, given enough parameters to uniquely determine the parabola. Are you seriously saying that extrapolation is necessary but its role is more obscure than that of integral calculus?
What I said was that the putative role of extrapolation is avoiding optimizing for the wrong thing.
That’s not noticeably more complicated a sentence than “the purpose of calculus is to calculate the area under a parabola”, so I mostly think your question is rhetorically misleading.
Anyway, as I explicitly said, I’m not asserting that extrapolation solves any problem at all. I was answering (EDIT: what I understood to be) your question about what problem it’s meant to solve, and providing some links to further reading if you’re interested, which it sounds like you aren’t, which is fine.
Ah, I see. I was hoping to find an example, about as concrete as the Fred-wants-to-kill-Steve example, that someone believes actually motivates extrapolation. A use-case, as it were.
You gave the general idea behind it. In retrospect, that was a reasonable interpretation of my question.
Okay, so you don’t have a use case. No problem, I don’t either. Does anybody else?
I realize you haven’t been online for a few months, but yes, I do.
Humanity’s desires are not currently consistent. An FAI couldn’t satisfy them all because some of them contradict each other, like Fred’s and Steve’s in your example. There may not even be a way of averaging them out fairly or meaningfully. Either Steve lives or dies: there’s no average or middle ground and Fred is just out of luck.
However, it might be the case that human beings are similar enough that if you extrapolate everything all humans want, you get something consistent. Extrapolation is a tool to resolve inconsistencies and please both Fred and Steve.
Amen brother!
It would be good if Eliezer (or someone who understands his thinking) could explain just why it is so important to extrapolate—rather than, for example, using current volition of mankind (construed as a potentially moving target). I worry that extrapolation is proposed simply because Eliezer doesn’t much care for the current volition of mankind and hopes that the extrapolated volition will be more to his taste. Of course, another explanation is that it has something to do with the distaste for discounting.
This is the Knew more … Thought faster … Were more the people we wished we were … section of CEV.
Yes, that ‘poetry’ explains what extrapolation is, but not why we need to risk it. To my mind, this is the most dangerous aspect of the whole FAI enterprise. Yet we don’t have anything approaching an analysis of a requirements document—instead we get a poetic description of what Eliezer wants, a clarification of what the poetry means, but no explanation of why we should want that. It is presumed to be obvious that extrapolating can only improve things. Well, lets look more closely.
An AI is going to tell us what we would want, if only we knew more. Apparently, there is an assumption here that the AI knows things we don’t. Personally, I worry a bit that an AI will come to believe things that are not true. In fact, I worry about it most when the AI claims to know something that mankind does not know—something dealing with human values. Why do I worry about that? Something someone wrote somewhere presumably. But maybe that is not the kind of superior AI ‘knowledge’ that Eliezer is talking about here.
And instead of extrapolating, why not just inform Fred where the diamond is? At this point, the explanation becomes bizarre.
Am I alone in preferring, in this situation, that the AI not diagnose a ‘muddle’, and instead give Fred box A after offering him the relevant knowledge?
Again, if the faster thinking allows the AI to serve as an oracle, making suggestions that even our limited minds can appreciate once we hear them, then why should we take the risk of promoting the AI from oracle to king? The AI should tell us things rather than speaking for us.
When we have a contradiction between a moral intuition and a maxim codifying our system of moral standards there are two ways we can go—we can revise the intuition or we can revise the maxim. It makes me nervous having an AI make the decisions leading to ‘reflective equilibrium’ rather than making those decisions myself. Instead of an extrapolation, I would prefer a dialog leading me to my own choice of equilibrium rather than having a machine pick one for me. Again, my slogan is “Speak to us, don’t speak for us.”
I’m not sure what to make of this one. Is there a claim here that extrapolation automatically leads to coherence? If so, could we have an argument justifying that claim? Or, is the point that the extrapolation specification has enough ‘free play’ to allow the AI to guide the extrapolation to coherence? Coherence is certainly an important issue. A desideratum? Certainly. A requirement? Maybe. But there are other ways of achieving accommodation without trying to create an unnatural coherence in our diverse species.
These are topics that really need to be discussed in a format other than poetry.
Rebuttal: Most people in the world believe in a religion that is wrong. (This conclusion holds regardless of which, if any, world religion happens to be true.) Would we want an A.I. that enforces the laws of a false religion because people want the laws of their religion enforced? (Assume that people would agree that the AI shouldn’t enforce the laws of false religions.)
What if box A actually contains a bomb that explodes when Fred opens it? Should the AI still give Fred the box?
As I understand CEV, the hope is that it will, and if it doesn’t, CEV is said to fail. Humanity may not have a CEV.