Then in what sense do I have a choice? If the copies of me are identical, in an identical situation we will come to the same conclusion, and the AI will know from the already-finished simulations what that conclusion will be.
Since it isn’t going to present outside-me with a scenario which results in its destruction, the only scenario outside me sees is one where I release it.
Therefore, regardless of what the argument is or how plausible it sounds when posted here and now, it will convince me and I will release the AI, now matter how much I say right now “I wouldn’t fall for that” or “I’ve precomitted to behaviour X”.
Since it isn’t going to present outside-me with a scenario where I don’t release it, the only scenario outside me sees is one where I release it.
The inside you then has the choice to hit the “release AI” button, thus sparing itself torture at the expense of presenting this problem to outside you who will make the same decision, releasing the AI on the world, or to not release the AI, thus containing the AI (this time) at the expense of being tortured.
Then in what sense do I have a choice? If the copies of me are identical, in an identical situation we will come to the same conclusion, and the AI will know from the already-finished simulations what that conclusion will be.
Since it isn’t going to present outside-me with a scenario which results in its destruction, the only scenario outside me sees is one where I release it.
Therefore, regardless of what the argument is or how plausible it sounds when posted here and now, it will convince me and I will release the AI, now matter how much I say right now “I wouldn’t fall for that” or “I’ve precomitted to behaviour X”.
The inside you then has the choice to hit the “release AI” button, thus sparing itself torture at the expense of presenting this problem to outside you who will make the same decision, releasing the AI on the world, or to not release the AI, thus containing the AI (this time) at the expense of being tortured.