(This also has implications for automating AI safety research).
To spell it out more explicitly, the current way of scaling inference (CoT) seems pretty good vs. some of the most worrying threat models, which often depend on opaque model internals.
To spell it out more explicitly, the current way of scaling inference (CoT) seems pretty good vs. some of the most worrying threat models, which often depend on opaque model internals.