interstice comments on Alignment Might Never Be Solved, By Humans or AI

interstice 8 Oct 2022 3:08 UTC
3 points
0
Hmm, I think it’s possible that sparser models will be much easier to verify, but it hardly seems inevitable. Certainly not clear if sparser models will be so much more interpretable that alignment becomes asymptotically cheap.