Simple solution: Build an FAI to optimize the universe to your own utility function instead of humanity’s average utility function. They will be nearly the same thing anyway (remember, you were tempted to have the FAI use the average human utility function instead, so clearly, you sincerely care about other people’s wishes). And in weird situations in which the two are radically different (like this one), your own utility function more closely tracks the intended purpose of an FAI.
Here’s what I’ve been trying to say: The thing that you want an FAI to do is optimize the universe to your utility function. That’s the definition of your utility function. This will be very close to the average human utility function because you care about what other people want. If you do not want the FAI to do things like punishing people you hate (and I assume that you don’t want that), then your utility function assigns a great weight to the desires of other people, and if an FAI with your utility function does such a thing, it must have been misprogrammed. The only reason to use the average human utility function instead is TDT: If that’s what you are going to work towards, people are more likely to support your work. However, if you can convince them that on the average, your utility function is expected to be closer to theirs than the average human’s is because of situations like this, then that should not be an issue.
There is a worrying tendency on LW to acknowledge verbally moral antirealism, but then argue as if moral realism is true. We have little idea how much our individual extrapolations will disagree on what to do with the universe, indeed there is serious doubt over just how weird those extrapolations will seem to us. There is no in-principle reason for humans to agree on what to do under extrapolation, and in practice we tend to disagree a lot before extrapolation.
There is a worrying tendency on LW to acknowledge verbally moral antirealism, but then argue as if moral realism is true.
I did not intend to imply that moral realism was true. If I somehow seemed to indicate that, please explain so I can make the wording less confusing.
There is no in-principle reason for humans to agree on what to do under extrapolation, and in practice we tend to disagree a lot before extrapolation.
True, but many of the disagreements between people relate to methods rather than goals or morals, and these disagreements are not relevant under extrapolation. Plus, I want other people to get what they want, so if an AI programmed to optimize the universe to my utility function does not do something fairly similar to optimizing the universe to the average human utility function, either the AI is misprogrammed or the average human utility function changed radically through unfavorable circumstances like the one described in the top-level post. I suspect that the same thing is true of you. And if you do not want other people to get what they want, what is the point of using the average human utility function in the first place?
Most fundamentalist christians, although believing that there is a hell and that people like me are destined for it, and want their religion to be right, probably would not want an approximation of their religion created conditional on it not already being right. An AI cannot make Bob right.
That being said, there probably are some people who would want me thrown into hell anyway even if their religion stipulating that I would be was not right in the first place. I should amend my statement: I want people to get what they want in ways that do not conflict, or conflict only minimally, with what other people want. Also, the possibility that there are a great many people like the Bob (as I said, I’m not quite sure how many fundamentalists would want to make their religion true even if it isn’t) is a very good reason not to use the average human utility function for the CEV. As you said, I do not want Bob to get what he wants and I suspect that you don’t either. So why would you want to create an FAI with a CEV that is inclined to accommodate Bob’s wish (which greatly conflicts with what other people want) if it proves especially popular?
CEV doesn’t just average people’s wishes. It extrapolates what people would do if they were better informed. Even if Bob wants to create a hell right now, his extrapolated volition may be for something else.
Well, I suppose we can reliably expect that there are not enough people like Bob, and me getting tortured removes much more utility from me than it gives Bob, but that’s missing the point.
Imagine yourself in a world in which the vast majority of people want to subject a certain minority group to eternal torture. The majority who want that minority group to be tortured is so vast that an FAI with an average human utility function-based CEV would be likely to subject the members of that minority group to eternal torture. You have the ability to create an FAI with a CEV based off of the average human utility function, with your personal utility function, or not at all. What do you do?
Silly me, I thought that we were arguing about whether using a personal utility function is a better substitute, and I was rather confused at what appeared to be a sudden concession. Looking at the comments above, I notice that you in fact only disputed my claim that the results would be very similar.
There are a lot of different positions people could take and I think you often demand unreasonable dichotomies. First, there is something more like a trichotomy of realism, (anti-realist) cognitivism and anti-cognitivism. Only partially dependent on that is the question of extrapolation. One could believe that there is a (human-)right answer to human moral questions here-and-now, without believing that weirder questions have right answers or that the answer to simple questions would be invariant under extrapolation.
Just because philosophers are wasting the term realism, doesn’t mean that it’s a good idea to redefine it. You are the one guilty of believing that everyone will converge on a meaning for the word.
I happen to agree with the clause you quote because I think the divergence of a single person is so great as to swamp 6 billion people. I imagine that if one could contain that divergence, one would hardly worry about the problem of different people.
I happen to agree with the clause you quote because I think the divergence of a single person is so great as to swamp 6 billion people. I imagine that if one could contain that divergence, one would hardly worry about the problem of different people.
Today, people tend to spend more time and worry about the threat that other people pose than the threat that they themselves (in another mood, perhaps) pose.
This might weakly indicate that inter-person divergence is bigger than intra-person.
Looking from another angle, what internal conflicts are going to be persistent and serious within a person? It seems to me that I don’t have massive trouble reconciling different moral intuitions, compared to the size and persistence of, say, the Israel-Palestine conflict, which is an inter-person conflict.
The difference between Eliezer’s cognitivism and the irrealist stance of, e.g. Greene is just syntactic, they mean the same thing. That is, they mean that values are arbitrary products of chance events, rather than logically derivable truths.
This seems to track with the Eliezer’s fictional “conspiracies of knowledge”: if we don’t want our politicians to get their hands on our nuclear weapons (or the theory for their operation), then why should they be allowed a say in what our FAI thinks?
Besides, the purpose of CEV is to extrapolate the volition humanity would have if it were more intelligent—and since you just created the first AI, you are clearly the most intelligent person in the world (not that you didn’t already know that). Therefore, using your own current utility function is an even better approximation than trying to extrapolate humanity’s volition to your own level of intelligence!
That’s not an accurate parallel. The fact that you thought it was a good idea to use the average human utility function proves that you expect it to have a result almost identical to an FAI using your own utility function. If the average human wants you not to kill the orphans, and you also want not to kill the orphans, it doesn’t matter which algorithm you use to decide not to kill the orphans.
I think that you’re looking too deeply into this; what I’m trying to say is that accepting excuses of the form “I was tempted to do ~x before doing x, so clearly I have properties characteristic of someone who does ~x” is a slippery slope.
If you killed the orphans because otherwise Dr. Evil would have converted the orphans into clones of himself, and taken over the world, then your destruction of the orphanage is more indicative of a desire for Dr. Evil not to take over the world than any opinion on orphanages.
The fact you were tempted not to destroy the orphanage (despite the issue of Dr. Evil) is indicative of the fact you don’t want to kill orphans.
I don’t see how it is slippery at all. Instead, it seems that you have simply jumped off the slope.
If you were tempted to save the orphans you have some properties that lead to not killing orphans. You likely share some properties with compassionate, moral people.
That doesn’t make you compassionate or moral. I’m often tempted to murder people by cutting out their heart and shoving it into their mouth.
This doesn’t make me a murderer, but it does mean I have some properties characteristic of murderers.
Simple solution: Build an FAI to optimize the universe to your own utility function instead of humanity’s average utility function. They will be nearly the same thing anyway (remember, you were tempted to have the FAI use the average human utility function instead, so clearly, you sincerely care about other people’s wishes). And in weird situations in which the two are radically different (like this one), your own utility function more closely tracks the intended purpose of an FAI.
Here’s what I’ve been trying to say: The thing that you want an FAI to do is optimize the universe to your utility function. That’s the definition of your utility function. This will be very close to the average human utility function because you care about what other people want. If you do not want the FAI to do things like punishing people you hate (and I assume that you don’t want that), then your utility function assigns a great weight to the desires of other people, and if an FAI with your utility function does such a thing, it must have been misprogrammed. The only reason to use the average human utility function instead is TDT: If that’s what you are going to work towards, people are more likely to support your work. However, if you can convince them that on the average, your utility function is expected to be closer to theirs than the average human’s is because of situations like this, then that should not be an issue.
I dispute this claim:
There is a worrying tendency on LW to acknowledge verbally moral antirealism, but then argue as if moral realism is true. We have little idea how much our individual extrapolations will disagree on what to do with the universe, indeed there is serious doubt over just how weird those extrapolations will seem to us. There is no in-principle reason for humans to agree on what to do under extrapolation, and in practice we tend to disagree a lot before extrapolation.
I did not intend to imply that moral realism was true. If I somehow seemed to indicate that, please explain so I can make the wording less confusing.
True, but many of the disagreements between people relate to methods rather than goals or morals, and these disagreements are not relevant under extrapolation. Plus, I want other people to get what they want, so if an AI programmed to optimize the universe to my utility function does not do something fairly similar to optimizing the universe to the average human utility function, either the AI is misprogrammed or the average human utility function changed radically through unfavorable circumstances like the one described in the top-level post. I suspect that the same thing is true of you. And if you do not want other people to get what they want, what is the point of using the average human utility function in the first place?
Bob wants the AI to create as close an approximation to hell as possible, and throw you into it forever, because he is a fundamentalist christian.
Are you sure you want bob to get what he wants?
Most fundamentalist christians, although believing that there is a hell and that people like me are destined for it, and want their religion to be right, probably would not want an approximation of their religion created conditional on it not already being right. An AI cannot make Bob right.
That being said, there probably are some people who would want me thrown into hell anyway even if their religion stipulating that I would be was not right in the first place. I should amend my statement: I want people to get what they want in ways that do not conflict, or conflict only minimally, with what other people want. Also, the possibility that there are a great many people like the Bob (as I said, I’m not quite sure how many fundamentalists would want to make their religion true even if it isn’t) is a very good reason not to use the average human utility function for the CEV. As you said, I do not want Bob to get what he wants and I suspect that you don’t either. So why would you want to create an FAI with a CEV that is inclined to accommodate Bob’s wish (which greatly conflicts with what other people want) if it proves especially popular?
CEV doesn’t just average people’s wishes. It extrapolates what people would do if they were better informed. Even if Bob wants to create a hell right now, his extrapolated volition may be for something else.
I wouldn’t.
Well, I suppose we can reliably expect that there are not enough people like Bob, and me getting tortured removes much more utility from me than it gives Bob, but that’s missing the point.
Imagine yourself in a world in which the vast majority of people want to subject a certain minority group to eternal torture. The majority who want that minority group to be tortured is so vast that an FAI with an average human utility function-based CEV would be likely to subject the members of that minority group to eternal torture. You have the ability to create an FAI with a CEV based off of the average human utility function, with your personal utility function, or not at all. What do you do?
With my personal utility function, of course, which would, by my definition of the term “right”, always do the right thing.
Silly me, I thought that we were arguing about whether using a personal utility function is a better substitute, and I was rather confused at what appeared to be a sudden concession. Looking at the comments above, I notice that you in fact only disputed my claim that the results would be very similar.
I want bob to think he gets what he wants.
There are a lot of different positions people could take and I think you often demand unreasonable dichotomies. First, there is something more like a trichotomy of realism, (anti-realist) cognitivism and anti-cognitivism. Only partially dependent on that is the question of extrapolation. One could believe that there is a (human-)right answer to human moral questions here-and-now, without believing that weirder questions have right answers or that the answer to simple questions would be invariant under extrapolation.
Just because philosophers are wasting the term realism, doesn’t mean that it’s a good idea to redefine it. You are the one guilty of believing that everyone will converge on a meaning for the word.
I happen to agree with the clause you quote because I think the divergence of a single person is so great as to swamp 6 billion people. I imagine that if one could contain that divergence, one would hardly worry about the problem of different people.
Today, people tend to spend more time and worry about the threat that other people pose than the threat that they themselves (in another mood, perhaps) pose.
This might weakly indicate that inter-person divergence is bigger than intra-person.
Looking from another angle, what internal conflicts are going to be persistent and serious within a person? It seems to me that I don’t have massive trouble reconciling different moral intuitions, compared to the size and persistence of, say, the Israel-Palestine conflict, which is an inter-person conflict.
The difference between Eliezer’s cognitivism and the irrealist stance of, e.g. Greene is just syntactic, they mean the same thing. That is, they mean that values are arbitrary products of chance events, rather than logically derivable truths.
This seems to track with the Eliezer’s fictional “conspiracies of knowledge”: if we don’t want our politicians to get their hands on our nuclear weapons (or the theory for their operation), then why should they be allowed a say in what our FAI thinks?
Besides, the purpose of CEV is to extrapolate the volition humanity would have if it were more intelligent—and since you just created the first AI, you are clearly the most intelligent person in the world (not that you didn’t already know that). Therefore, using your own current utility function is an even better approximation than trying to extrapolate humanity’s volition to your own level of intelligence!
“I was tempted not to kill all those orphans, so clearly, I’m a compassionate and moral person.”
That’s not an accurate parallel. The fact that you thought it was a good idea to use the average human utility function proves that you expect it to have a result almost identical to an FAI using your own utility function. If the average human wants you not to kill the orphans, and you also want not to kill the orphans, it doesn’t matter which algorithm you use to decide not to kill the orphans.
I think that you’re looking too deeply into this; what I’m trying to say is that accepting excuses of the form “I was tempted to do ~x before doing x, so clearly I have properties characteristic of someone who does ~x” is a slippery slope.
If you killed the orphans because otherwise Dr. Evil would have converted the orphans into clones of himself, and taken over the world, then your destruction of the orphanage is more indicative of a desire for Dr. Evil not to take over the world than any opinion on orphanages.
The fact you were tempted not to destroy the orphanage (despite the issue of Dr. Evil) is indicative of the fact you don’t want to kill orphans.
I don’t see how it is slippery at all. Instead, it seems that you have simply jumped off the slope.
If you were tempted to save the orphans you have some properties that lead to not killing orphans. You likely share some properties with compassionate, moral people.
That doesn’t make you compassionate or moral. I’m often tempted to murder people by cutting out their heart and shoving it into their mouth.
This doesn’t make me a murderer, but it does mean I have some properties characteristic of murderers.
What if you’re a preference utilitarian?
If you are a true preference utilitarian, then the FAI will implement preference utilitarianism when it maximizes your utility function.
My point was that a preference utilitarian would let Dr Evil rule the world, in that scenario.
Although, obviously, if you’re a preference utilitarian then that’s what you actually want.