If the gatekeepers have such a high prior that the AI is hostile, why are we even letting it talk?
The point of the game is that there are people who think that boxing is a sufficient defence against unfriendliness, and to demonstrate that they are wrong in a way more convincing than mere verbal argument.
What are we expecting to learn from such a conversation?
In role, the gatekeeper expects to get useful information from a potentially hostile superintelligent being. Out of role, Eliezer hopes to demonstrate to the gatekeeper player that this cannot be done.
The point of the game is that there are people who think that boxing is a sufficient defence against unfriendliness, and to demonstrate that they are wrong in a way more convincing than mere verbal argument.
In role, the gatekeeper expects to get useful information from a potentially hostile superintelligent being. Out of role, Eliezer hopes to demonstrate to the gatekeeper player that this cannot be done.