Yeah my guess is that Eliezer is empirically wrong about humans being broadly sufficiently similar to converge to the same morality upon ideal reflection; I was just writing about that last month in Section 2.7.2 here.
Has Eliezer actually made this claim? (the CEV paper from what I recall talks about a designing a system that checks for whether human values cohere, and shuts down automatically if they don’t. This does imply a likely enough chance of success to be worth building a CEV machine but I don’t know how likely he actually thought)
Oh sorry, it’s in the thing I linked. I was thinking of Eliezer’s metaethics sequence, for example:
When a paperclip maximizer and a pencil maximizer do different things, they are not disagreeing about anything, they are just different optimization processes. You cannot detach should-ness from any specific criterion of should-ness and be left with a pure empty should-ness that the paperclip maximizer and pencil maximizer can be said to disagree about—unless you cover “disagreement” to include differences where two agents have nothing to say to each other.
But this would be an extreme position to take with respect to your fellow humans, and I recommend against doing so. Even a psychopath would still be in a common moral reference frame with you, if, fully informed, they would decide to take a pill that would make them non-psychopaths. If you told me that my ability to care about other people was neurologically damaged, and you offered me a pill to fix it, I would take it. Now, perhaps some psychopaths would not be persuadable in-principle to take the pill that would, by our standards, “fix” them. But I note the possibility to emphasize what an extreme statement it is to say of someone:
“We have nothing to argue about, we are only different optimization processes.”
That should be reserved for paperclip maximizers, not used against humans whose arguments you don’t like.
Hmm, I guess it’s not 100% clear from that quote by itself, but I read the whole metaethics sequence a couple months ago and this was my strong impression—I think a lot of the stuff he wrote just doesn’t make sense unless you include a background assumption that human neurodiversity doesn’t impact values-upon-ideal-reflection, with perhaps (!) an exception for psychopaths.
This is another example—like, it’s not super-explicit, just the way he lumps humans together, in the context of everything else he wrote.
It has been a few years since I read that sequence in full, but my impression was that Eliezer thought there were some basic pieces that human morality is made out of, and some common ways of finding/putting those pieces together, though they needn’t be exactly the same. If you run this process for long enough, using the procedures humans use to construct their values, then you’d end up in some relatively small space compared to the space of all possible goals, or all evolutionarily fit goals for superintelligences etc.
So too for a psychopath. This seems plausible to me. I don’t expect a psychopath to wind up optimizing for paper-clips on reflection. But I also don’t expect to be happy in a world that ranks very high accordting to a psychopath’s values-on-reflection. Plausibly, I wouldn’t even exist in such a world.
Yeah my guess is that Eliezer is empirically wrong about humans being broadly sufficiently similar to converge to the same morality upon ideal reflection; I was just writing about that last month in Section 2.7.2 here.
Has Eliezer actually made this claim? (the CEV paper from what I recall talks about a designing a system that checks for whether human values cohere, and shuts down automatically if they don’t. This does imply a likely enough chance of success to be worth building a CEV machine but I don’t know how likely he actually thought)
Oh sorry, it’s in the thing I linked. I was thinking of Eliezer’s metaethics sequence, for example:
Hmm, I guess it’s not 100% clear from that quote by itself, but I read the whole metaethics sequence a couple months ago and this was my strong impression—I think a lot of the stuff he wrote just doesn’t make sense unless you include a background assumption that human neurodiversity doesn’t impact values-upon-ideal-reflection, with perhaps (!) an exception for psychopaths.
This is another example—like, it’s not super-explicit, just the way he lumps humans together, in the context of everything else he wrote.
(Sorry if I’m putting words in anyone’s mouth.)
It has been a few years since I read that sequence in full, but my impression was that Eliezer thought there were some basic pieces that human morality is made out of, and some common ways of finding/putting those pieces together, though they needn’t be exactly the same. If you run this process for long enough, using the procedures humans use to construct their values, then you’d end up in some relatively small space compared to the space of all possible goals, or all evolutionarily fit goals for superintelligences etc.
So too for a psychopath. This seems plausible to me. I don’t expect a psychopath to wind up optimizing for paper-clips on reflection. But I also don’t expect to be happy in a world that ranks very high accordting to a psychopath’s values-on-reflection. Plausibly, I wouldn’t even exist in such a world.