Hmm, I think it’s possible that sparser models will be much easier to verify, but it hardly seems inevitable. Certainly not clear if sparser models will be so much more interpretable that alignment becomes asymptotically cheap.
Hmm, I think it’s possible that sparser models will be much easier to verify, but it hardly seems inevitable. Certainly not clear if sparser models will be so much more interpretable that alignment becomes asymptotically cheap.