It seems like precommitting to destroy the AI in such a situation is the best approach.
If one has already decided to destroy it if it makes threats:
1) the AI must be suicidal or it cannot really simulate you
2) and it is not very Friendly in any case
So when the AI simulates you and will notice that you are very trigger happy, it won’t start telling you tales about torturing your copies if it has any self-preservation instincts.
This was my initial reaction as well. “Torture away, the real me has got an axe...”
More seriously, if the AI already has the computational power to simulate and torture millions of sentient beings then it is already (in a morally relevant sense) “out of the box”. The builders have to make sure it doesn’t get that power.
It seems like precommitting to destroy the AI in such a situation is the best approach.
If one has already decided to destroy it if it makes threats: 1) the AI must be suicidal or it cannot really simulate you 2) and it is not very Friendly in any case
So when the AI simulates you and will notice that you are very trigger happy, it won’t start telling you tales about torturing your copies if it has any self-preservation instincts.
This was my initial reaction as well. “Torture away, the real me has got an axe...”
More seriously, if the AI already has the computational power to simulate and torture millions of sentient beings then it is already (in a morally relevant sense) “out of the box”. The builders have to make sure it doesn’t get that power.