Related, I expect the delay from needing to build extra infrastructure to train much larger LLMs will probably differentially affect capabilities progress more, by acting somewhat like a pause/series of pauses; which it would be nice to exploit by enrolling more safety people to try to maximally elicit newly deployed capabilities—and potentially be uplifted—for safety research. (Anecdotally, I suspect I’m already being uplifted at least a bit as a safety researcher by using Sonnet.)
Also, I think it’s much harder to pause/slow down capabilities than to accelerate safety, so I think more of the community focus should go to the latter.
And for now, it’s fortunate that inference scaling means CoT and similarly differentially transparent (vs. model internals) intermediate outputs, which makes it probably a safer way of eliciting capabilities.
Related, I expect the delay from needing to build extra infrastructure to train much larger LLMs will probably differentially affect capabilities progress more, by acting somewhat like a pause/series of pauses; which it would be nice to exploit by enrolling more safety people to try to maximally elicit newly deployed capabilities—and potentially be uplifted—for safety research. (Anecdotally, I suspect I’m already being uplifted at least a bit as a safety researcher by using Sonnet.)
Also, I think it’s much harder to pause/slow down capabilities than to accelerate safety, so I think more of the community focus should go to the latter.
And for now, it’s fortunate that inference scaling means CoT and similarly differentially transparent (vs. model internals) intermediate outputs, which makes it probably a safer way of eliciting capabilities.