If the gatekeepers have such a high prior that the AI is hostile, why are we even letting it talk? What are we expecting to learn from such a conversation?
Note that the ‘high prior for hostility’ could be, say, 0.2. That still leaves the AI having a 20% chance of utterly destroying everything they hold dear and yet there is an 80% chance that they can get a useful sentence of text from a friendly superintelligence.
Note that the ‘high prior for hostility’ could be, say, 0.2. That still leaves the AI having a 20% chance of utterly destroying everything they hold dear and yet there is an 80% chance that they can get a useful sentence of text from a friendly superintelligence.