The scope of our argument seems to have grown beyond what a single comment thread is suitable for.
AI safety via debate is 2 years before Writeup: Progress on AI Safety via Debate so the latter post should be more up-to-date. I think that post does a good job of considering potential problems; the issue is that I think the noted problems & assumptions can’t be handled well, make that approach very limited in what it can do for alignment, and aren’t really dealt with by “Doubly-efficient debate”. I don’t think such debate protocols are totally useless, but they’re certainly not a “solution to alignment”.
The scope of our argument seems to have grown beyond what a single comment thread is suitable for.
AI safety via debate is 2 years before Writeup: Progress on AI Safety via Debate so the latter post should be more up-to-date. I think that post does a good job of considering potential problems; the issue is that I think the noted problems & assumptions can’t be handled well, make that approach very limited in what it can do for alignment, and aren’t really dealt with by “Doubly-efficient debate”. I don’t think such debate protocols are totally useless, but they’re certainly not a “solution to alignment”.