If you’re going to interpret it that way, the exception would swallow the rule. It would mean that the entire “the AI player controls the results of simulated tests” rule can be completely negated—since the Gatekeeper player could just say “I’m going to have the Gatekeeper act as though the simulated test has failed, even though you say it succeeded.”
And indeed this seems true. I think Eliezer included the non-rule anyway to reduce the chance of unrealistic behavior in the sense of the Gatekeeper player changing the scenario mid-game, or derailing the experiment with an argument about something a real GK and AI could just settle.
If you’re going to interpret it that way, the exception would swallow the rule. It would mean that the entire “the AI player controls the results of simulated tests” rule can be completely negated—since the Gatekeeper player could just say “I’m going to have the Gatekeeper act as though the simulated test has failed, even though you say it succeeded.”
And indeed this seems true. I think Eliezer included the non-rule anyway to reduce the chance of unrealistic behavior in the sense of the Gatekeeper player changing the scenario mid-game, or derailing the experiment with an argument about something a real GK and AI could just settle.