Joseph Bloom comments on Interpreting Preference Models w/​ Sparse Autoencoders