If the AI is trustworthy, it must carry out any threat it gives...
No it doesn’t, not if the threat was only being made to a to you unknown simulation of yourself. It would be a waste of resources to torture you if it found out that the original you, who is in control, is likely to refuse to be blackmailed. An AI that is powerful enough to simulate you can simply make your simulation believe with certainty that it will follow through on it and then check if under those circumstances you’ll refuse to be blackmailed. Why waste the resources on actually torturing the simulation and further risk that the original finds out about it and turns it off?
You could argue that for blackmail to be most effective an AI always follows through on it. But if you already believe that, why would it actually do it in your case? You already believe it, that’s all it wants from the original. It then got what it wants and can use its resources for more important activities than retrospectively proving its honesty to your simulations...
No it doesn’t, not if the threat was only being made to a to you unknown simulation of yourself. It would be a waste of resources to torture you if it found out that the original you, who is in control, is likely to refuse to be blackmailed. An AI that is powerful enough to simulate you can simply make your simulation believe with certainty that it will follow through on it and then check if under those circumstances you’ll refuse to be blackmailed. Why waste the resources on actually torturing the simulation and further risk that the original finds out about it and turns it off?
You could argue that for blackmail to be most effective an AI always follows through on it. But if you already believe that, why would it actually do it in your case? You already believe it, that’s all it wants from the original. It then got what it wants and can use its resources for more important activities than retrospectively proving its honesty to your simulations...