Agreed that there are important subtleties here. In this post, I am really just using the safety-via-debate set-up as a sort of intuitive case for getting us thinking about why we generally seem to trust certain algorithms running in the human brain to adjudicate hard evaluative tasks related to AI safety. I don’t mean to be making any especially specific claims about safety-via-debate as a strategy (in part for precisely the reasons you specify in this comment).
Agreed that there are important subtleties here. In this post, I am really just using the safety-via-debate set-up as a sort of intuitive case for getting us thinking about why we generally seem to trust certain algorithms running in the human brain to adjudicate hard evaluative tasks related to AI safety. I don’t mean to be making any especially specific claims about safety-via-debate as a strategy (in part for precisely the reasons you specify in this comment).