I will note that the AI box experiment’s conditions expressly forbid a secure environment [i.e. one with inspection tools that cannot be manipulated by the AI]:
the results seen by the Gatekeeper shall again be provided by the AI party, which is assumed to be sufficiently advanced to rewrite its own source code, manipulate the appearance of its own thoughts if it wishes, and so on.
Because that’s not the part of the AI safety question that the AI box experiment is designed to test, so for the purpose of the experiment it says, “sure you might catch the AI in a lie, but assuming you don’t—”
I will note that the AI box experiment’s conditions expressly forbid a secure environment [i.e. one with inspection tools that cannot be manipulated by the AI]:
Because that’s not the part of the AI safety question that the AI box experiment is designed to test, so for the purpose of the experiment it says, “sure you might catch the AI in a lie, but assuming you don’t—”