Second, it is also possible to robustly verify the outputs of a superhuman intelligence without superhuman intelligence.
Why do you believe that a superhuman intelligence wouldn’t be able to deceive you by producing outputs that look correct instead of outputs that are correct?
I don’t have the specifics but this is just a natural tendency of many problems—verification is easier than coming up with the solution. Also maybe there are systems where we can require the output to be mathematically verified or reject solutions whose outcomes are hard to understand.
Why do you believe that a superhuman intelligence wouldn’t be able to deceive you by producing outputs that look correct instead of outputs that are correct?
Davidad’s plan involves one plausible way of doing that
I don’t have the specifics but this is just a natural tendency of many problems—verification is easier than coming up with the solution. Also maybe there are systems where we can require the output to be mathematically verified or reject solutions whose outcomes are hard to understand.