There was no monetary stake. Officially, the AI pays the Gatekeepers $20 if they lose. I’m a well-off software engineer and $20 is an irrelevant amount of money. Ra is not a well-off software engineer, so scaling up the money until it was enough to matter wasn’t a great solution. Besides, we both took the game seriously. I might not have bothered to prepare, but once the game started I played to win.
I know this is unhelpful after the fact, but (for any other pair of players in this situation) you could switch it up so that the Gatekeeper pays the AI if the AI gets out. Then you could raise the stake until it’s a meaningful disincentive for the Gatekeeper.
(If the AI and the Gatekeeper are too friendly with each other to care much about a wealth transfer, they could find a third party, e.g. a charity, that they don’t actually think is evil but would prefer not to give money to, and make it the beneficiary.)
If AI victories are supposed to provide public evidence that this ‘impossible’ feat of persuasion is in fact possible even for a human (let alone an ASI), then a Gatekeeper who thinks some legal tactic would work but chooses not to use it is arguably not playing the game in good faith.
I think honesty would require that they either publicly state that the ‘play dumb/drop out of character’ technique was off-limits, or not present the game as one which the Gatekeeper was seriously motivated to win.
edit: for clarity, I’m saying this because the technique is explicitly allowed by the rules: