NaiveTortoise comments on New safety research agenda: scalable agent alignment via reward modeling