bhauth comments on AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work

bhauth 26 Aug 2024 8:29 UTC
3 points
3
The scope of our argument seems to have grown beyond what a single comment thread is suitable for.

AI safety via debate is 2 years before Writeup: Progress on AI Safety via Debate so the latter post should be more up-to-date. I think that post does a good job of considering potential problems; the issue is that I think the noted problems & assumptions can’t be handled well, make that approach very limited in what it can do for alignment, and aren’t really dealt with by “Doubly-efficient debate”. I don’t think such debate protocols are totally useless, but they’re certainly not a “solution to alignment”.