People can hold different moral views. Sometimes these views are opposed and any compromise would be called immoral by at least one of them. Any AI that enforced such a compromise, would be called unFriendly by at least one of them.
Even for a moral realist (and I don’t think well of that position), the above remains true, because people demonstrably have irreconcilably different moral views. If you’re a moral realist, you have the choice of:
Implement objective moral truth however defined, and ignore everyone’s actual moral feelings. In which case FAI is irrelevant—if the moral truth tells you to be unFriendly, you do it.
Implement some pre-chosen function—your own morals, or many people’s morals like CEV, or some other thing that does not depend on moral truth.
If you’re a moral anti-realist, you can only choose 2, because no moral truth exists. That’s the only difference stemming from being a moral realist or anti-realist.
Does this mean that Friendly-to-everyone AI is impossible in moral anti-realism? Certainly, because people have fundamental moral disagreements. But moral realism doesn’t help! It just adds the option of following some “moral facts” which some or all humans disagree with, which is no better in terms of Friendliness than existing options. (If all humans agreed with some set of purported moral facts, people wouldn’t have needed to invent the concept of moral facts in the first place.)
The existence of moral disagreement, standing alone, is not enough to show moral realism is false. After all, scientific disagreement doesn’t show physical realism is false.
Further, I am confused by your portrayal of moral realists. Presumably, the reality of moral facts would show that people acting contrary to those facts were making a mistake, much like people who thought “Objects in motion will tend to come to a stop” were making a mistake. It seems strange to call correcting that mistake “ignoring everyone’s actual scientific feelings.” Likewise, if I am unknowingly doing wrong, and you can prove it, I would not view that correction as ignoring my moral feelings—I want to do right, not just think I am doing right.
In short, I think that the position you are labeling “moral realist” is just a very confused version of moral anti-realism. Moral realists can and should reject that idea that the mere existence at any particular moment of moral disagreement is useful evidence of whether there is one right answer. In other words, a distinction should be made between the existence of moral disagreement and the long-term persistence of moral disagreement.
The existence of moral disagreement, standing alone, is not enough to show moral realism is false.
I didn’t say that it was. Rather I pointed out the difference between morality and Friendliness.
For an AI to be able to be Friendly towards everyone requires not moral realism, but “friendliness realism”—which is basically the idea that a single behavior of the AI can satisfy everyone. This is clearly false if “everyone” means “all intelligences including aliens, other AIs, etc.” It may be true if we restrict ourselves to “all humans” (and stop humans from diversifying too much, and don’t include hypothetical or far-past humans).
I, personally, believe the burden of proof is on those who believe this to be possible to demonstrate it. My prior for “all humans” says they are a very diverse and selfish bunch and not going to be satisfied by any one arrangement of the universe.
Regardless, moral realism and friendliness realism are different. If you built an objectively moral but unFriendly AI, that’s the scenario I discussed in my previous comment—and people would be unhappy. OTOH, if you think a Friendly AI is by logical necessity a moral one (under moral realism), that’s a very strong claim about objective morals—a claim that people would perceive an AI implementing objective morals as Friendly. This is a far stronger claim than that people who are sufficiently educated and exposed to the right knowledge will come to agree with certain universal objective morals. A Friendly AI means one that is Friendly to people as they really are, here and now. (As I said, to me it seems very likely that an AI cannot in fact be Friendly to everyone at once.)
I think we are simply having a definitional dispute. As the term is used generally, moral realism doesn’t mean that each agent has a morality, but that there are facts about morality that are external to the agent (i.e. objective). Now, “objective” is not identical to “universal,” but in practice, objective facts tend to cause convergence of beliefs. So I think what I am calling “moral realism” is something like what you are calling “Friendliness realism.”
Lengthening the inferential distance further is that realism is a two place word. As you noted, there is a distinction between realism(Friendliness, agents) and realism(Friendliness, humans).
That said, I do think that “people would perceive an AI implementing objective morals as Friendly” if I believed that objective morals exist. I’m not sure why you think that’s a stronger claim than “people who are sufficiently educated and exposed to the right knowledge will come to agree with certain universal objective morals.” If you believed that there were objective moral facts and knew the content of those facts, wouldn’t you try to adjust your beliefs and actions to conform to those facts, in the same way that you would adjust your physical-world beliefs to conform to objective physical facts?
I think we are simply having a definitional dispute.
That seems likely. If moral realists think the morality is a one-place word, and anti realists think it’s a two place word, we would be better served by using two distinct words.
It is somewhat unclear to me what moral realists are thinking of, or claiming, about whatever it is they call morality. (Even after taking into account that different people identified as moral realists do not all agree on the subject.)
So I think what I am calling “moral realism” is something like what you are calling “Friendliness realism.”
I defined ‘Friendliness (to X)’ as ‘behaving towards X in the way that is best for X in some implied sense’. Obviously there is no Friendliness towards everyone, but there might be Friendliness towards humans: then “Friendliness realism” (my coining) is the belief that there is a single Friendly-towards-humans behavior that will in fact be Friendly towards all humans. Whereas Friendliness anti-realism is the belief no one behavior would satisfy all humans, and it would inevitably be unFriendly towards some of them.
Clearly this discussion assumes many givens. Most importantly, 1) what exactly counts as being Friendly towards someone (are we utilitarian? what kind? must we agree with the target human as to what is Friendly towards them? If we influence them to come to like us, when is that allowed?). 2) what is the set of ‘all humans’? Do past, distant, future expected, or entirely hypothetical people count? What is the value of creating new people? Etc.
My position is that: 1) for most common assumed answers to these questions, I am a “Friendliness anti-realist”; I do not believe any one behavior by a superpowerful universe-optimizing AI would count as Friendliness towards all humans at once. And 2), inasfar as I have seen moral realism explained, it seems to me to be incompatible with Friendliness realism. But it’s possible some people mean something entirely different by “morals” and by “moral realism” than what I’ve read.
If you believed that there were objective moral facts and knew the content of those facts, wouldn’t you try to adjust your beliefs and actions to conform to those facts
That’s a tautology: yes I would. But, the assumption is not valid.
Even if you assume there exist objective moral facts (whatever you take that to mean), it does not follow that you would be able to convince other people that they are true moral facts! I believe it is extremely likely you would not be able to convince people—just as today most people in the world seem to be moral realists (mostly religious), and yet hold widely differing moral beliefs and when they convert to another set of beliefs it is almost never due to some sort of rational convincing.
It would be nice to live in a world where you could start from the premise that “people believe that there are objective moral facts and know the content of those facts”. But in practice we, and any future FAI, will live in a world where most people will reject mere verbal arguments in favor of new morals contradicting their current ones.
People can hold different moral views. Sometimes these views are opposed and any compromise would be called immoral by at least one of them. Any AI that enforced such a compromise, would be called unFriendly by at least one of them.
Even for a moral realist (and I don’t think well of that position), the above remains true, because people demonstrably have irreconcilably different moral views. If you’re a moral realist, you have the choice of:
Implement objective moral truth however defined, and ignore everyone’s actual moral feelings. In which case FAI is irrelevant—if the moral truth tells you to be unFriendly, you do it.
Implement some pre-chosen function—your own morals, or many people’s morals like CEV, or some other thing that does not depend on moral truth.
If you’re a moral anti-realist, you can only choose 2, because no moral truth exists. That’s the only difference stemming from being a moral realist or anti-realist.
Does this mean that Friendly-to-everyone AI is impossible in moral anti-realism? Certainly, because people have fundamental moral disagreements. But moral realism doesn’t help! It just adds the option of following some “moral facts” which some or all humans disagree with, which is no better in terms of Friendliness than existing options. (If all humans agreed with some set of purported moral facts, people wouldn’t have needed to invent the concept of moral facts in the first place.)
The existence of moral disagreement, standing alone, is not enough to show moral realism is false. After all, scientific disagreement doesn’t show physical realism is false.
Further, I am confused by your portrayal of moral realists. Presumably, the reality of moral facts would show that people acting contrary to those facts were making a mistake, much like people who thought “Objects in motion will tend to come to a stop” were making a mistake. It seems strange to call correcting that mistake “ignoring everyone’s actual scientific feelings.” Likewise, if I am unknowingly doing wrong, and you can prove it, I would not view that correction as ignoring my moral feelings—I want to do right, not just think I am doing right.
In short, I think that the position you are labeling “moral realist” is just a very confused version of moral anti-realism. Moral realists can and should reject that idea that the mere existence at any particular moment of moral disagreement is useful evidence of whether there is one right answer. In other words, a distinction should be made between the existence of moral disagreement and the long-term persistence of moral disagreement.
I didn’t say that it was. Rather I pointed out the difference between morality and Friendliness.
For an AI to be able to be Friendly towards everyone requires not moral realism, but “friendliness realism”—which is basically the idea that a single behavior of the AI can satisfy everyone. This is clearly false if “everyone” means “all intelligences including aliens, other AIs, etc.” It may be true if we restrict ourselves to “all humans” (and stop humans from diversifying too much, and don’t include hypothetical or far-past humans).
I, personally, believe the burden of proof is on those who believe this to be possible to demonstrate it. My prior for “all humans” says they are a very diverse and selfish bunch and not going to be satisfied by any one arrangement of the universe.
Regardless, moral realism and friendliness realism are different. If you built an objectively moral but unFriendly AI, that’s the scenario I discussed in my previous comment—and people would be unhappy. OTOH, if you think a Friendly AI is by logical necessity a moral one (under moral realism), that’s a very strong claim about objective morals—a claim that people would perceive an AI implementing objective morals as Friendly. This is a far stronger claim than that people who are sufficiently educated and exposed to the right knowledge will come to agree with certain universal objective morals. A Friendly AI means one that is Friendly to people as they really are, here and now. (As I said, to me it seems very likely that an AI cannot in fact be Friendly to everyone at once.)
I think we are simply having a definitional dispute. As the term is used generally, moral realism doesn’t mean that each agent has a morality, but that there are facts about morality that are external to the agent (i.e. objective). Now, “objective” is not identical to “universal,” but in practice, objective facts tend to cause convergence of beliefs. So I think what I am calling “moral realism” is something like what you are calling “Friendliness realism.”
Lengthening the inferential distance further is that realism is a two place word. As you noted, there is a distinction between realism(Friendliness, agents) and realism(Friendliness, humans).
That said, I do think that “people would perceive an AI implementing objective morals as Friendly” if I believed that objective morals exist. I’m not sure why you think that’s a stronger claim than “people who are sufficiently educated and exposed to the right knowledge will come to agree with certain universal objective morals.” If you believed that there were objective moral facts and knew the content of those facts, wouldn’t you try to adjust your beliefs and actions to conform to those facts, in the same way that you would adjust your physical-world beliefs to conform to objective physical facts?
That seems likely. If moral realists think the morality is a one-place word, and anti realists think it’s a two place word, we would be better served by using two distinct words.
It is somewhat unclear to me what moral realists are thinking of, or claiming, about whatever it is they call morality. (Even after taking into account that different people identified as moral realists do not all agree on the subject.)
I defined ‘Friendliness (to X)’ as ‘behaving towards X in the way that is best for X in some implied sense’. Obviously there is no Friendliness towards everyone, but there might be Friendliness towards humans: then “Friendliness realism” (my coining) is the belief that there is a single Friendly-towards-humans behavior that will in fact be Friendly towards all humans. Whereas Friendliness anti-realism is the belief no one behavior would satisfy all humans, and it would inevitably be unFriendly towards some of them.
Clearly this discussion assumes many givens. Most importantly, 1) what exactly counts as being Friendly towards someone (are we utilitarian? what kind? must we agree with the target human as to what is Friendly towards them? If we influence them to come to like us, when is that allowed?). 2) what is the set of ‘all humans’? Do past, distant, future expected, or entirely hypothetical people count? What is the value of creating new people? Etc.
My position is that: 1) for most common assumed answers to these questions, I am a “Friendliness anti-realist”; I do not believe any one behavior by a superpowerful universe-optimizing AI would count as Friendliness towards all humans at once. And 2), inasfar as I have seen moral realism explained, it seems to me to be incompatible with Friendliness realism. But it’s possible some people mean something entirely different by “morals” and by “moral realism” than what I’ve read.
That’s a tautology: yes I would. But, the assumption is not valid.
Even if you assume there exist objective moral facts (whatever you take that to mean), it does not follow that you would be able to convince other people that they are true moral facts! I believe it is extremely likely you would not be able to convince people—just as today most people in the world seem to be moral realists (mostly religious), and yet hold widely differing moral beliefs and when they convert to another set of beliefs it is almost never due to some sort of rational convincing.
It would be nice to live in a world where you could start from the premise that “people believe that there are objective moral facts and know the content of those facts”. But in practice we, and any future FAI, will live in a world where most people will reject mere verbal arguments in favor of new morals contradicting their current ones.