These frontier models could still be vulnerable to stealth (e.g. “sleeper agent”) attacks, specialist models, and stealth attacks by specialist models. The balance depends on the ability gap – if the top model is way ahead of others, then maybe defence dominates attack efforts. But a big ability gap does not seem to be playing out, instead there are several frontier models near-frontier, and lots of (more or less) open source stuff not far behind.
These frontier models could still be vulnerable to stealth (e.g. “sleeper agent”) attacks, specialist models, and stealth attacks by specialist models. The balance depends on the ability gap – if the top model is way ahead of others, then maybe defence dominates attack efforts. But a big ability gap does not seem to be playing out, instead there are several frontier models near-frontier, and lots of (more or less) open source stuff not far behind.