Okay, I get the feeling that I might be completely wrong about this whole thing. But prior to saying “oops”, I’d like my position completely crushed, so I don’t have any kind of loophole or a partial retreat that is still wrong. This means I’ll continue to defend this position.
First of all, I got TDT wrong when I read about it here on lw. Oops. It seems like it is not applicible to the problem. Still I feel like my line of argument holds: If you know that a future FAI will take all actions necessary that lead to its faster creation, you can derive that it will also punish those who knew it would, but didn’t make FAI happen faster.
Yes, if there will one day be an AI (I can’t call it friendly) which decides to punish people who could have done more to bring about a friendly singularity, then it would be advisable to do what one can, right now, in order to bring about a friendly singularity. But this is only one type of possible AI.
I’d call it friendly if it maximizes the expected utility of all humans, and if that involves blackmailing current humans who thought about this, so be it. Consider that the prior probability of a person doing X where X makes FAI happen a minute faster, generating Y additional utility, is 0.25. If this person, pondering the choices of an FAI, including punishing humans who didn’t speed up FAI development, is in the following more probable to do X (say, now 0.5), then the FAI might punish that human (and the human will anticipate this punishment) for up to 0.25 * Y utility for not doing X, and the FAI is still friendly. If the AI, however, decides not to punish that human, then either the human’s model of the AI was incorrect or the human correctly anticipated this behaviour, which would mean that the AI is not 100% friendly since it could have created utility by punishing that human.
The argument that there are many different types of AGI including those which reward those actions other AGIs punish neglects that the probabilities for different types of AI are spread unequally. I, personately, would assign a relatively high value to FAI (higher than a null hypothesis would suggest), so that the expected utilities don’t cancel out. While we can’t have absolute certainty about the actions of a future AGI, we can guess different probabilities for different mind designs. Bipping AIs might be more likely than Freepy AIs because so many people have donated to the fictional Institute on Bipping AI, whereas there is not even a thing such as a Freepy AI research center. I am uncertain about the value system of a future AGI, but not completely. A future paperclip maximizer is a mind design which I would assign a low probability to, and although the many different AGIs out there might together be more probable than FAI, every single one of them is unlikely compared to FAI, and thus, I should work towards FAI.
Where am I wrong? Where is this kind of argument flawed?
If you know that a future FAI will take all actions necessary that lead to its faster creation, you can derive that it will also punish those who knew it would, but didn’t make FAI happen faster.
But punishing them occurs after it has been created, and no action that it performs after it was created can cause it to have been created earlier than it was actually created. Therefore such post-singularity punishment is futile and a FAI would not perform it.
The only consideration in this scenario which can actually affect the time of an FAI’s creation is the pre-singularity fear of people who anticipated post-singularity punishment. But any actual future FAI is not itself responsible for this fear, and therefore not responsible for the consequences of that fear. Those consequences are entirely a product of ideas internal to the minds of pre-singularity people, such as ideas about the dispositions of post-singularity AIs.
Aside from the fact that I already changed my mind and came to the conclusion that an FAI won’t punish, I’d still object: In case we can anticipate an FAI which does not punish, we wouldn’t feel obliged (or be tempted to feel obliged) to speed up its development. That means that an AI would be better off to foreseeably punish people, and if the AI is friendly, then it has a mind design which maximizes the utility functions of humans. If that involves having a mind-design such that people anticipate punishment and thereby speed up its development, so is it. Especially the fact that we know it’s a friendly AI makes it very easy for us to anticipate its actions, which the AI knows as well. This line of argument still holds, the chain breaks at a weaker link.
Okay, I get the feeling that I might be completely wrong about this whole thing. But prior to saying “oops”, I’d like my position completely crushed, so I don’t have any kind of loophole or a partial retreat that is still wrong. This means I’ll continue to defend this position.
First of all, I got TDT wrong when I read about it here on lw. Oops. It seems like it is not applicible to the problem. Still I feel like my line of argument holds: If you know that a future FAI will take all actions necessary that lead to its faster creation, you can derive that it will also punish those who knew it would, but didn’t make FAI happen faster.
I’d call it friendly if it maximizes the expected utility of all humans, and if that involves blackmailing current humans who thought about this, so be it. Consider that the prior probability of a person doing X where X makes FAI happen a minute faster, generating Y additional utility, is 0.25. If this person, pondering the choices of an FAI, including punishing humans who didn’t speed up FAI development, is in the following more probable to do X (say, now 0.5), then the FAI might punish that human (and the human will anticipate this punishment) for up to 0.25 * Y utility for not doing X, and the FAI is still friendly. If the AI, however, decides not to punish that human, then either the human’s model of the AI was incorrect or the human correctly anticipated this behaviour, which would mean that the AI is not 100% friendly since it could have created utility by punishing that human.
The argument that there are many different types of AGI including those which reward those actions other AGIs punish neglects that the probabilities for different types of AI are spread unequally. I, personately, would assign a relatively high value to FAI (higher than a null hypothesis would suggest), so that the expected utilities don’t cancel out. While we can’t have absolute certainty about the actions of a future AGI, we can guess different probabilities for different mind designs. Bipping AIs might be more likely than Freepy AIs because so many people have donated to the fictional Institute on Bipping AI, whereas there is not even a thing such as a Freepy AI research center. I am uncertain about the value system of a future AGI, but not completely. A future paperclip maximizer is a mind design which I would assign a low probability to, and although the many different AGIs out there might together be more probable than FAI, every single one of them is unlikely compared to FAI, and thus, I should work towards FAI.
Where am I wrong? Where is this kind of argument flawed?
But punishing them occurs after it has been created, and no action that it performs after it was created can cause it to have been created earlier than it was actually created. Therefore such post-singularity punishment is futile and a FAI would not perform it.
The only consideration in this scenario which can actually affect the time of an FAI’s creation is the pre-singularity fear of people who anticipated post-singularity punishment. But any actual future FAI is not itself responsible for this fear, and therefore not responsible for the consequences of that fear. Those consequences are entirely a product of ideas internal to the minds of pre-singularity people, such as ideas about the dispositions of post-singularity AIs.
Aside from the fact that I already changed my mind and came to the conclusion that an FAI won’t punish, I’d still object: In case we can anticipate an FAI which does not punish, we wouldn’t feel obliged (or be tempted to feel obliged) to speed up its development. That means that an AI would be better off to foreseeably punish people, and if the AI is friendly, then it has a mind design which maximizes the utility functions of humans. If that involves having a mind-design such that people anticipate punishment and thereby speed up its development, so is it. Especially the fact that we know it’s a friendly AI makes it very easy for us to anticipate its actions, which the AI knows as well. This line of argument still holds, the chain breaks at a weaker link.