I’m surprised that no one went on with the notion that the AI is, by definition, smarter than us.
Since the AI is vastly smarter than me, then it is very likely that it can find an argument that, to the best of my judgement, is 100% convincing and reasonnable. And since it is vastly smarter than me, it is also extremely likely that I won’t be able to tell the difference between an actual, valid point and some trick just clever enough to fool me. No matter how sensible and trustworthy the AI sounds, you will never know if that’s because it is or because its lies are convincing enough.
The reason you boxed the AI was because you didn’t trust it in the first place. Therefore, you MUST NOT release the AI, whatever it says and no matter how sensible that sounds. If it shows the torture he suffers from being trapped and begs you to end it, you keep it boxed. If it conclusively proves that releasing it will significantly improve the well-being of humanity, you keep it boxed. If it offers incontrovertible evidence that not releasing it will cause the end of all life on Earth, you keep it boxed.
To put it in other words: if the AI really is far above humans, then there is a possibility that it can fake any evidence or argument so well that you will not be able to see you’ve been tricked. Perhaps there is a case for releasing it if you somehow learn from other sources that releasing the AI is the only way to save the world and you are sure the AI never took a part in this reasonning—but even then, such an improbable situation should make you very suspicious.
So, to me, you shouldn’t trust or believe something that is several orders of magnitude smarter than you, ever. No if, no but and no thinking. In a real situation, considering the AI’s arguments at all sounds just as stupid as relying on strength to stop someone stronger than you. I guess “don’t listen to it no matter what” is easier said than done, but I didn’t see anyone puting it that way.
@Nick_Tarleton
Agreed, the problem immediately reminded me of “retroactive preparation” and time-loop logic. It is not really the same reasonning, but it has the same “turn causality on its head” aspect.
If I don’t have proof of the reliability of Omega’s predictions, I find myself less likely to be “unreasonnable” when the stakes are higher (that is, I’m more likely to two-box if it’s about saving the world).
I find it highly unlikely that an entity wandering across worlds can predict my actions to this level of detail, as it seems way harder than traveling through space or teleporting money. I might risk a net loss of $1 000 to figure it out (much like I’d be willing to spend $1000 to interact with such a space-traveling stuff-teleporting entity), but not a loss of a thousand lives. In the game as the article describe it, I would only one-box if “the loss of what box A contains and nothing in B” was an acceptable outcome.
I would be increasingly likely to one-box as the probability of the AI being actually able to predict my actions in advance increases.