O O comments on Can we get an AI to “do our alignment homework for us”?

O O 26 Feb 2024 9:07 UTC
7 points
0
My intuition is that it is at least feasible to align a human level intelligence with the “obvious” methods that fail for superintelligence, and have them run faster to to produce superhuman output.
Second, it is also possible to robustly verify the outputs of a superhuman intelligence without superhuman intelligence.
And third, there is a lot of value to be captured from narrow AI that don’t have deceptive capabilities but are very good at say solving math.
- Chris_Leong 26 Feb 2024 16:14 UTC
  10 points
  1
  Parent
  Second, it is also possible to robustly verify the outputs of a superhuman intelligence without superhuman intelligence.
  
  Why do you believe that a superhuman intelligence wouldn’t be able to deceive you by producing outputs that look correct instead of outputs that are correct?
  - Mateusz Bagiński 26 Feb 2024 18:59 UTC
    3 points
    3
    Parent
    Davidad’s plan involves one plausible way of doing that
  - O O 27 Feb 2024 17:05 UTC
    1 point
    0
    Parent
    I don’t have the specifics but this is just a natural tendency of many problems—verification is easier than coming up with the solution. Also maybe there are systems where we can require the output to be mathematically verified or reject solutions whose outcomes are hard to understand.