(Placeholder: I think this view of alignment/model internals seems wrongheaded in a way which invalidates the conclusion, but don’t have time to leave a meaningful reply now. Maybe we should hash this out sometime at Lighthaven.)
(Placeholder: I think this view of alignment/model internals seems wrongheaded in a way which invalidates the conclusion, but don’t have time to leave a meaningful reply now. Maybe we should hash this out sometime at Lighthaven.)