I don’t think that people valuing eternal torture of other humans is much of a concern, because they don’t value it nearly as much as the people in question disvalue being tortured.
Suppose most people agree on valuing the torture of a few people, and only a few people disagree. Would you be OK with the majority’s values outweighing the minority’s, if it’s a large enough majority?
If you’re OK with that, and if this is not specific to the example of torture, then you are effectively saying that you value the extrapolated consensus values of humanity more than your own, even though you don’t know what those values may be. That you value the (unspecified) CEV process, and whatever values it ends up generating, more than any other values you currently hold. Is that so?
Even if you’re OK with that, you’d be vulnerable to a “clone utility monster”: if I can clone myself faster than average, then the values of me and my clones will come to dominate the global population. This seems true for almost any value aggregation process given a large enough majority (fast enough cloning).
No, I would not be okay with it.
I don’t terminally value CEV. I think it would be instrumentally valuable, because scenarios where everyone wants to torture a few people are not that likely. I would prefer that only my own extrapolated utility function controlled the universe. Unlike Eliezer Yudkowsky, I don’t care that much about not being a jerk. But that is not going to happen.
If this detail from the original paper still stands, the CEV is allowed to modify the extrapolating process. So if there was the threat of everyone having to race to clone themselves as much as possible for more influence, it might modify itself to give clones less weight, or prohibit cloning.
So if there was the threat of everyone having to race to clone themselves as much as possible for more influence, it might modify itself to give clones less weight, or prohibit cloning
Prohibiting these things, and CEV self-modifying in general, means optimizing for certain values or a certain outcome. Where do these values come from? From the CEV’s programmers. But if you let certain predetermined values override the (unknown) CEV-extrapolated values, how do you make these choices, and where do you draw the line?
I mean that the CEV extrapolated from the entire population before they start a clone race could cause that self-modification or prohibition, not something explicitly put in by the programmers.
Suppose most people agree on valuing the torture of a few people, and only a few people disagree. Would you be OK with the majority’s values outweighing the minority’s, if it’s a large enough majority?
If you’re OK with that, and if this is not specific to the example of torture, then you are effectively saying that you value the extrapolated consensus values of humanity more than your own, even though you don’t know what those values may be. That you value the (unspecified) CEV process, and whatever values it ends up generating, more than any other values you currently hold. Is that so?
Even if you’re OK with that, you’d be vulnerable to a “clone utility monster”: if I can clone myself faster than average, then the values of me and my clones will come to dominate the global population. This seems true for almost any value aggregation process given a large enough majority (fast enough cloning).
No, I would not be okay with it. I don’t terminally value CEV. I think it would be instrumentally valuable, because scenarios where everyone wants to torture a few people are not that likely. I would prefer that only my own extrapolated utility function controlled the universe. Unlike Eliezer Yudkowsky, I don’t care that much about not being a jerk. But that is not going to happen. If this detail from the original paper still stands, the CEV is allowed to modify the extrapolating process. So if there was the threat of everyone having to race to clone themselves as much as possible for more influence, it might modify itself to give clones less weight, or prohibit cloning.
Prohibiting these things, and CEV self-modifying in general, means optimizing for certain values or a certain outcome. Where do these values come from? From the CEV’s programmers. But if you let certain predetermined values override the (unknown) CEV-extrapolated values, how do you make these choices, and where do you draw the line?
I mean that the CEV extrapolated from the entire population before they start a clone race could cause that self-modification or prohibition, not something explicitly put in by the programmers.