Lao Mein comments on Take 9: No, RLHF/IDA/debate doesn’t solve outer alignment.

Lao Mein 14 Dec 2022 4:32 UTC
1 point
0
What about the toy version of the alignment via debate problem, where two human experts try to convince a human layman about a complex issue they lack the biological capability to fully understand (e.g. 90 IQ layman and the Poincare Conjecture)? Have experiments been run on this? I just don’t see how someone who can’t “get” calculus after many years of trying can separate good and bad arguments in a field far beyond their ability to understand.
- Aaron_Scher 20 Dec 2022 21:57 UTC
  3 points
  0
  Parent
  You might look here for more info: https://www.alignmentforum.org/posts/PJLABqQ962hZEqhdB/debate-update-obfuscated-arguments-problem