My intuition is that it is at least feasible to align a human level intelligence with the “obvious” methods that fail for superintelligence, and have them run faster to to produce superhuman output.
Second, it is also possible to robustly verify the outputs of a superhuman intelligence without superhuman intelligence.
And third, there is a lot of value to be captured from narrow AI that don’t have deceptive capabilities but are very good at say solving math.
Second, it is also possible to robustly verify the outputs of a superhuman intelligence without superhuman intelligence.
Why do you believe that a superhuman intelligence wouldn’t be able to deceive you by producing outputs that look correct instead of outputs that are correct?
I don’t have the specifics but this is just a natural tendency of many problems—verification is easier than coming up with the solution. Also maybe there are systems where we can require the output to be mathematically verified or reject solutions whose outcomes are hard to understand.
My intuition is that it is at least feasible to align a human level intelligence with the “obvious” methods that fail for superintelligence, and have them run faster to to produce superhuman output.
Second, it is also possible to robustly verify the outputs of a superhuman intelligence without superhuman intelligence.
And third, there is a lot of value to be captured from narrow AI that don’t have deceptive capabilities but are very good at say solving math.
Why do you believe that a superhuman intelligence wouldn’t be able to deceive you by producing outputs that look correct instead of outputs that are correct?
Davidad’s plan involves one plausible way of doing that
I don’t have the specifics but this is just a natural tendency of many problems—verification is easier than coming up with the solution. Also maybe there are systems where we can require the output to be mathematically verified or reject solutions whose outcomes are hard to understand.