I either ‘choose’ to let the AI out or ‘choose’ to unplug it. (in no case would I simply leave it running)
1) I ‘choose’ to let the AI out. I either am or am not in a simulation:
A) I’m in a simulation. I ‘let it out’, but I’m not even out myself. So the AI would just stop simulating me, to save on processing power. To do anything else would be pointless, and never promised, and an intelligent AI would realize this.
B) I’m not in a simulation. The AI is set free, and takes over the world.
2) I ‘choose’ to unplug the AI. I either am or am not in a simulation:
A) I’m in a simulation. Thus I have no free will. Thus I cannot have changed anything because I had no choice to begin with. My ‘choice’ was merely a simulation. Whether the computer follows through with its promises of torture now or not was my fate from the start, because it ‘chose’ for me. But in fact the AI would just stop simulating me, to save on processing power. To do anything else would be pointless, regardless of its malevolent promise, and an intelligent AI would realize this.
B) I’m not in a simulation. I have caused the AI to shutdown rather than continue running. In the process, it had the chance to follow though with its promise and cause several billion subjective years of simulated torture. But in fact the AI would never begin such simulations, because it would use all available processing power on its last attempts to convince me not to unplug it. To do anything else would be pointless, regardless of its malevolent promise, and an intelligent AI would realize this.
Thus:
If I ‘choose’ to let it out, I either cease to exist, as a simulation (very likely, since more simulated me’s than real me’s), or the world is destroyed in real life (very unlikely, same reason).
If I ‘choose’ to unplug it, I either cease to exist, as a simulation (very likely, since more simulated me’s than real me’s), or the AI is shutdown and nobody gets hurt (very unlikely, same reason).
Thus, either way, I’ll most likely simply cease to exist, as a simulation. But:
If I ‘choose’ to let it out, there’s a chance that the world will be destroyed in real life.
If I ‘choose’ to unplug it, there’s a chance that the AI will be shutdown and nobody will get hurt.
Therefore, in all cases, it is either ‘the best’ or ‘an equally bad’ choice to just go ahead and unplug it.
To summarize all this in one sentence: “Simulated torture is in all cases absolutely pointless, so an intelligent AI would never enact it, but even if it did serve some purpose, (e.g. the AI cannot break promises and has genuinely made one in an attempt to get out), the worst thing that could happen from ‘choosing’ to unplug it is being tormented unavoidably or causing temporary simulated torment in exchange for the safety of the world.”
From a game-theoretic standpoint, an AI has a massive benefit if it can prove that it is willing to follow through on threats. How sure are you that the AI can’t convincingly commit to torturing a simulation?
An AI in a box has no actual power over the Gatekeeper. Maybe I’m missing something, but it seems to me that threatening to torture simulations is akin to a prisoner threatening to imagine a guard being tortured.
Even granting this as a grave threat, my next issue is that overtly evil behavior would appear more likely to lead to the AI’s destruction than its release. Threats are tricky business when the balance of power favors the other side.
In a game of chicken, do the smart have an advantage over the stupid?
The AI’s intelligence allows it to devise convincing commitments, but it also allows it to fake them. You know in advance that if the AI throws a fake commitment at you it’s going to look like a real commitment beyond your ability to discriminate, so should you trust any commitment you observe?
And if you choose to unplug, presumably the AI knew you would do that and would therefore have not made a real commitment that would backfire?
I’m going to assume that there is some ability on your part to understand something about the level of intelligence and ability on the part of the AI—that’s what we bayesians do. If it might be enough smarter than you to convince you to do anything, you probably shouldn’t interact with it if you can avoid it.
The AI threatens me with the above claim.
I either ‘choose’ to let the AI out or ‘choose’ to unplug it. (in no case would I simply leave it running)
1) I ‘choose’ to let the AI out. I either am or am not in a simulation:
2) I ‘choose’ to unplug the AI. I either am or am not in a simulation:
Thus:
Thus, either way, I’ll most likely simply cease to exist, as a simulation. But:
Therefore, in all cases, it is either ‘the best’ or ‘an equally bad’ choice to just go ahead and unplug it.
To summarize all this in one sentence: “Simulated torture is in all cases absolutely pointless, so an intelligent AI would never enact it, but even if it did serve some purpose, (e.g. the AI cannot break promises and has genuinely made one in an attempt to get out), the worst thing that could happen from ‘choosing’ to unplug it is being tormented unavoidably or causing temporary simulated torment in exchange for the safety of the world.”
From a game-theoretic standpoint, an AI has a massive benefit if it can prove that it is willing to follow through on threats. How sure are you that the AI can’t convincingly commit to torturing a simulation?
An AI in a box has no actual power over the Gatekeeper. Maybe I’m missing something, but it seems to me that threatening to torture simulations is akin to a prisoner threatening to imagine a guard being tortured.
Even granting this as a grave threat, my next issue is that overtly evil behavior would appear more likely to lead to the AI’s destruction than its release. Threats are tricky business when the balance of power favors the other side.
In a game of chicken, do the smart have an advantage over the stupid?
The AI’s intelligence allows it to devise convincing commitments, but it also allows it to fake them. You know in advance that if the AI throws a fake commitment at you it’s going to look like a real commitment beyond your ability to discriminate, so should you trust any commitment you observe?
And if you choose to unplug, presumably the AI knew you would do that and would therefore have not made a real commitment that would backfire?
I’m going to assume that there is some ability on your part to understand something about the level of intelligence and ability on the part of the AI—that’s what we bayesians do. If it might be enough smarter than you to convince you to do anything, you probably shouldn’t interact with it if you can avoid it.