But even if I’m wrong, if there’s a Least Convenient Possible world where there are otherwise normal humans who have “kill all gays” irreversibly and directly programmed into their utility function, I don’t think a proper CEV would take that into account.
I’m not sure what to make of your use of the word “proper”. Are you predicting that a CEV will not be utilitarian or saying that you don’t want it to be?
I am saying that a CEV that extrapolated human morality would generally be utilitarian, but that it would grant a utility value of zero to satisfying what I call “malicious preferences.” That is, if someone valued frustrating someone else’s desires purely for their own sake, not because they needed the resources that person was using or something like that, the AI would not fulfill it.
This is because I think that a CEV of human morality would find the concept of malicious preferences to be immoral and discard or suppress it. My thinking on this was inspired by reading about Bryan Caplan’s debate with Robin Hanson, where Bryan mentioned:
...Robin endorses an endless list of bizarre moral claims. For example, he recently told me that “the main problem” with the Holocaust was that there weren’t enough Nazis! After all, if there had been six trillion Nazis willing to pay $1 each to make the Holocaust happen, and a mere six million Jews willing to pay $100,000 each to prevent it, the Holocaust would have generated $5.4 trillion worth of consumers surplus.
I don’t often agree with Bryan’s intuitionist approach to ethics, but I think he made a good point, satisfying the preferences of those trillion Nazis doesn’t seem like part of the meaning of right, and I think a CEV of human ethics would reflect this. I think that the preference of the six million Jews to live should be respected and the preferences of the six trillion Nazis be ignored.
I don’t think this is because of scope insensitivity, or because I am not a utilitarian. I endorse utilitarian ethics for the most part, but think that “malicious preferences” have zero or negative utility in their satisfaction, no matter how many people have them. For conflicts of preferences that involve things like disputes over use of scarce resources, normal utilitarianism applies.
In response to your question I have edited my post and changed “a proper CEV” to “a CEV of human morality.”
I am saying that a CEV that extrapolated human morality would generally be utilitarian, but that it would grant a utility value of zero to satisfying what I call “malicious preferences.”
This is because I think that a CEV of human morality would find the concept of malicious preferences to be immoral and discard or suppress it.
Zero is a strange number to have specified there, but then I don’t know the shape of the function you’re describing. I would have expected a non-specific “negative utility” in its place.
Zero is a strange number to have specified there, but then I don’t know the shape of the function you’re describing. I would have expected a non-specific “negative utility” in its place.
You’re probably right, I was typing fairly quickly last night.
I don’t think this is because of scope insensitivity, or because I am not a utilitarian. I endorse utilitarian ethics for the most part, but think that “malicious preferences” have zero or negative utility in their satisfaction, no matter how many people have them. For conflicts of preferences that involve things like disputes over use of scarce resources, normal utilitarianism applies.
Ah, okay. This sounds somewhat like Nozick’s “utilitarianism with side-constraints”. This position seems about as reasonable as the other major contenders for normative ethics, but some LessWrongers (pragmatist, Will_Sawin, etc...) consider it to be not even a kind of consequentialism.
I’m not sure what to make of your use of the word “proper”. Are you predicting that a CEV will not be utilitarian or saying that you don’t want it to be?
I am saying that a CEV that extrapolated human morality would generally be utilitarian, but that it would grant a utility value of zero to satisfying what I call “malicious preferences.” That is, if someone valued frustrating someone else’s desires purely for their own sake, not because they needed the resources that person was using or something like that, the AI would not fulfill it.
This is because I think that a CEV of human morality would find the concept of malicious preferences to be immoral and discard or suppress it. My thinking on this was inspired by reading about Bryan Caplan’s debate with Robin Hanson, where Bryan mentioned:
I don’t often agree with Bryan’s intuitionist approach to ethics, but I think he made a good point, satisfying the preferences of those trillion Nazis doesn’t seem like part of the meaning of right, and I think a CEV of human ethics would reflect this. I think that the preference of the six million Jews to live should be respected and the preferences of the six trillion Nazis be ignored.
I don’t think this is because of scope insensitivity, or because I am not a utilitarian. I endorse utilitarian ethics for the most part, but think that “malicious preferences” have zero or negative utility in their satisfaction, no matter how many people have them. For conflicts of preferences that involve things like disputes over use of scarce resources, normal utilitarianism applies.
In response to your question I have edited my post and changed “a proper CEV” to “a CEV of human morality.”
Zero is a strange number to have specified there, but then I don’t know the shape of the function you’re describing. I would have expected a non-specific “negative utility” in its place.
You’re probably right, I was typing fairly quickly last night.
Ah, okay. This sounds somewhat like Nozick’s “utilitarianism with side-constraints”. This position seems about as reasonable as the other major contenders for normative ethics, but some LessWrongers (pragmatist, Will_Sawin, etc...) consider it to be not even a kind of consequentialism.