This isn’t a bug in CEV, it’s a bug in the universe. Once the majority of conscious beings are Dr. Evil clones, then Dr. Evil becomes a utility monster and it gets genuinely important to give him what he wants.
But allowing Dr. Evil to clone himself is bad; it will reduce the utility of all currently existing humans except Dr. Evil.
If a normal, relatively nice but non-philosopher human ascended to godhood, ve would probably ignore Dr. Evil’s clones’ wishes. Ve would destroy the clones and imprison the doctor, because ve was angry at Dr. Evil for taking the utility-lowering action of cloning himself and wanted to punish him.
But everything goes better than expected! Dr. Evil hears a normal human is ascending to godhood, realizes making the clones won’t work, and submits passively to the new order. And rationalists should win, so a superintelligent AI should be able to do at least as well as a normal human by copying normal human methods when they pay off.
So an AI with sufficiently good decision theory could (I hate to say “would” here, because making quick assumptions that an AI would do the right thing is a good way to get yourself killed) use the same logic. Ve would say, before even encountering the world “I am precommiting that anyone who cloned themselves a trillion times gets all their clones killed. This precommitment will prevent anyone who genuinely understands my source code from having cloned themselves in the past, and will therefore increase utility.” Then ve opens ver sensors, sees Dr. Evil and his clones, and says “Sorry, I’d like to help you, but I precommited to not doing so,” kills all of the clones as painlessly as possible, and get around to saving the world.
“I am precommiting that anyone who cloned themselves a trillion times gets all their clones killed. This precommitment will prevent anyone who genuinely understands my source code from having cloned themselves in the past, and will therefore increase utility.”
Wait, increase utility according to what utility function? If it’s an aggregate utility function where Dr. Evil has 99% weight, then why would that precommitment increase utility?
You’re right. It will make a commitment to stop anyone who tries the same thing later, but it won’t apply it retroactively. The original comment is wrong.
Wait, increase utility according to what utility function?
The current CEV of humanity, or your best estimate of it, I think. If someone forces us to kill orphans or they’ll destroy the world, saving the world is higher utility, but we still want to punish the guy who made it so.
I think that’s where the idea came from, anyway; I agree with Yvain that it doesn’t work.
This isn’t a bug in CEV, it’s a bug in the universe. Once the majority of conscious beings are Dr. Evil clones, then Dr. Evil becomes a utility monster and it gets genuinely important to give him what he wants.
I think that’s wrong. At the very least, I don’t think it matches the scenario in the post. In particular, I think “how many people are there?” is a factual question, not a moral question. (and the answer is not an integer)
But the important (and moral) question here is “how do we count the people for utility purposes.” We also need a normative way to aggregate their utilities, and one vote per person would need to be justified separately.
This scenario actually gives us a guideline for aggregating utilities. We need to prevent Dr. Evil from counting more than once.
One proposal is to count people by different hours of experience, so that if I’ve had 300,000 hours of experience, and my clone has one hour that’s different, it counts as 1⁄300,000 of a person. But if we go by hours of experience, we have the problem that with enough clones, Dr. Evil can amass enough hours to overwhelm Earth’s current population (giving ten trillion clones each one unique hour of experience should do it).
So this indicates that we need to look at the utility functions. If two entities have the same utility function, they should be counted as the same entity, no matter what different experiences they have. This way, the only way Dr. Evil will be able to aggregate enough utility is to change the utility function of his clones, and then they won’t all want to do something evil. Something like using a convergent series for the utility of any one goal might work: if Dr. Evil wants to destroy the world, his clone’s desire to do so counts for 1⁄10 of that, and the next clone’s desire counts for 1⁄100, so he can’t accumulate more than 10⁄9 of his original utility weight.
EDIT: Doesn’t work, see Wei Dai below.
This isn’t a bug in CEV, it’s a bug in the universe. Once the majority of conscious beings are Dr. Evil clones, then Dr. Evil becomes a utility monster and it gets genuinely important to give him what he wants.
But allowing Dr. Evil to clone himself is bad; it will reduce the utility of all currently existing humans except Dr. Evil.
If a normal, relatively nice but non-philosopher human ascended to godhood, ve would probably ignore Dr. Evil’s clones’ wishes. Ve would destroy the clones and imprison the doctor, because ve was angry at Dr. Evil for taking the utility-lowering action of cloning himself and wanted to punish him.
But everything goes better than expected! Dr. Evil hears a normal human is ascending to godhood, realizes making the clones won’t work, and submits passively to the new order. And rationalists should win, so a superintelligent AI should be able to do at least as well as a normal human by copying normal human methods when they pay off.
So an AI with sufficiently good decision theory could (I hate to say “would” here, because making quick assumptions that an AI would do the right thing is a good way to get yourself killed) use the same logic. Ve would say, before even encountering the world “I am precommiting that anyone who cloned themselves a trillion times gets all their clones killed. This precommitment will prevent anyone who genuinely understands my source code from having cloned themselves in the past, and will therefore increase utility.” Then ve opens ver sensors, sees Dr. Evil and his clones, and says “Sorry, I’d like to help you, but I precommited to not doing so,” kills all of the clones as painlessly as possible, and get around to saving the world.
Wait, increase utility according to what utility function? If it’s an aggregate utility function where Dr. Evil has 99% weight, then why would that precommitment increase utility?
You’re right. It will make a commitment to stop anyone who tries the same thing later, but it won’t apply it retroactively. The original comment is wrong.
The current CEV of humanity, or your best estimate of it, I think. If someone forces us to kill orphans or they’ll destroy the world, saving the world is higher utility, but we still want to punish the guy who made it so.
I think that’s where the idea came from, anyway; I agree with Yvain that it doesn’t work.
I think that’s wrong. At the very least, I don’t think it matches the scenario in the post. In particular, I think “how many people are there?” is a factual question, not a moral question. (and the answer is not an integer)
But the important (and moral) question here is “how do we count the people for utility purposes.” We also need a normative way to aggregate their utilities, and one vote per person would need to be justified separately.
This scenario actually gives us a guideline for aggregating utilities. We need to prevent Dr. Evil from counting more than once.
One proposal is to count people by different hours of experience, so that if I’ve had 300,000 hours of experience, and my clone has one hour that’s different, it counts as 1⁄300,000 of a person. But if we go by hours of experience, we have the problem that with enough clones, Dr. Evil can amass enough hours to overwhelm Earth’s current population (giving ten trillion clones each one unique hour of experience should do it).
So this indicates that we need to look at the utility functions. If two entities have the same utility function, they should be counted as the same entity, no matter what different experiences they have. This way, the only way Dr. Evil will be able to aggregate enough utility is to change the utility function of his clones, and then they won’t all want to do something evil. Something like using a convergent series for the utility of any one goal might work: if Dr. Evil wants to destroy the world, his clone’s desire to do so counts for 1⁄10 of that, and the next clone’s desire counts for 1⁄100, so he can’t accumulate more than 10⁄9 of his original utility weight.