I have misc other takes on what safety work now is good vs useless, but that work involving feedback/approval or RLHF isn’t much signal either way.
(If anything I get somewhat annoyed by people not comparing to baselines without having principled reasons for not doing so. E.g., inventing new ways of doing training without comparing to normal training.)
I have misc other takes on what safety work now is good vs useless, but that work involving feedback/approval or RLHF isn’t much signal either way.
(If anything I get somewhat annoyed by people not comparing to baselines without having principled reasons for not doing so. E.g., inventing new ways of doing training without comparing to normal training.)