Fourth, you can’t possibly be relying on tricking the gatekeeper into thinking incorrectly. That would require you to have spotted something that you could feel confident that other people working in the field would not have spotted, and would not spot, despite having been warned ahead of time to be wary of trickery, and despite having the fallback position in the case of confusion of just saying “no”.
I think the space of things that an AI could trick you into thinking incorrectly about (Edit: and that could also be used to get the AI out of the box) is bigger than AI researchers can be relied on to have explored, and two hours of Eliezer “explaining” something to you (subtly sneaking in tricks to your understanding of it) could give you false confidence in your understanding of it.
I think the space of things that an AI could trick you into thinking incorrectly about (Edit: and that could also be used to get the AI out of the box) is bigger than AI researchers can be relied on to have explored, and two hours of Eliezer “explaining” something to you (subtly sneaking in tricks to your understanding of it) could give you false confidence in your understanding of it.