I think the Go example really gets to the heart of why I think Debate doesn’t cut it.
Your comment is an argument against using Debate to settle moral questions. However, what if Debate is trained on Physics and/or math questions, with the eventual goal of asking “what is a provably secure alignment proposal?”
Good question. There’s a big roadblock to your idea as stated, which is that asking something to define “alignment” is a moral question. But suppose we sorted out a verbal specification of an aligned AI and had a candidate FAI coded up—could we then use Debate on the question “does this candidate match the verbal specification?”
I don’t know—I think it still depends on how bad humans are as judges of arguments—we’ve made the domain more objective, but maybe there’s some policy of argumentation that still wins by what we would consider cheating. I can imagine being convinced that it would work by seeing Debates play out with superhuman litigators, but since that’s a very high bar maybe I should apply more creativity to my expextations.
suppose we sorted out a verbal specification of an aligned AI and had a candidate FAI coded up—could we then use Debate on the question “does this candidate match the verbal specification?”
I’m less excited about this, and more excited about candidate training processes or candidate paradigms of AI research (for example, solutions to embedded agency). I expect that there will be a large cluster of techniques which produce safe AGIs, we just need to find them—which may be difficult, but hopefully less difficult with Debate involved.
Your comment is an argument against using Debate to settle moral questions. However, what if Debate is trained on Physics and/or math questions, with the eventual goal of asking “what is a provably secure alignment proposal?”
Good question. There’s a big roadblock to your idea as stated, which is that asking something to define “alignment” is a moral question. But suppose we sorted out a verbal specification of an aligned AI and had a candidate FAI coded up—could we then use Debate on the question “does this candidate match the verbal specification?”
I don’t know—I think it still depends on how bad humans are as judges of arguments—we’ve made the domain more objective, but maybe there’s some policy of argumentation that still wins by what we would consider cheating. I can imagine being convinced that it would work by seeing Debates play out with superhuman litigators, but since that’s a very high bar maybe I should apply more creativity to my expextations.
I’m less excited about this, and more excited about candidate training processes or candidate paradigms of AI research (for example, solutions to embedded agency). I expect that there will be a large cluster of techniques which produce safe AGIs, we just need to find them—which may be difficult, but hopefully less difficult with Debate involved.