My response to the “get the AI to tell us what questions we need to ask” is that it fails for multiple reasons, any one of which is sufficient for failure. One of them is the verifiability issue. Another is the Gell-Mann Amnesia thing (which you could view as just another frame on the verifiability issue, but up a meta level). Another is the “get what we measure” problem.
Another failure mode which this post did not discuss is the Godzilla Problem. In the frame of this post: in order to work in practice the iterative design loop needs to be able to self-correct; if we make a mistake at one iteration it must be fixable at later iterations. “Get the AI to tell us what questions we need to ask” fails that test; just one iteration of acting on malicious advice from an AI can permanently break the design loop.
My response to the “get the AI to tell us what questions we need to ask” is that it fails for multiple reasons, any one of which is sufficient for failure. One of them is the verifiability issue. Another is the Gell-Mann Amnesia thing (which you could view as just another frame on the verifiability issue, but up a meta level). Another is the “get what we measure” problem.
Another failure mode which this post did not discuss is the Godzilla Problem. In the frame of this post: in order to work in practice the iterative design loop needs to be able to self-correct; if we make a mistake at one iteration it must be fixable at later iterations. “Get the AI to tell us what questions we need to ask” fails that test; just one iteration of acting on malicious advice from an AI can permanently break the design loop.