nostalgebraist comments on Interpreting Preference Models w/​ Sparse Autoencoders