I would expect that some amount of good safety research is of the form, “We tried several ways of persuading several leading AI models how to give accurate instructions for breeding antibiotic-resistant bacteria. Here are the ways that succeeded, here are some first-level workarounds, here’s how we beat those workarounds...”: in other words, stuff that would be dangerous to publish. In the most extreme cases, a mere title (“Telling the AI it’s writing a play defeats all existing safety RLHF” or “Claude + Coverity finds zero-day RCE exploits in many codebases”) could be dangerous.
That said, some large amount should be publishable, and 5 papers does seem low.
Though maybe they’re not making an effort to distinguish what’s safe to publish from what’s not, and erring towards assuming the latter? (Maybe someone set a policy of “Before publishing any safety research, you have to get Important Person X to look through it and/or go through some big process to ensure publishing it is safe”, and the individual researchers are consistently choosing “Meh, I have other work to do, I won’t bother with that” and therefore not publishing?)
I would expect that some amount of good safety research is of the form, “We tried several ways of persuading several leading AI models how to give accurate instructions for breeding antibiotic-resistant bacteria. Here are the ways that succeeded, here are some first-level workarounds, here’s how we beat those workarounds...”: in other words, stuff that would be dangerous to publish. In the most extreme cases, a mere title (“Telling the AI it’s writing a play defeats all existing safety RLHF” or “Claude + Coverity finds zero-day RCE exploits in many codebases”) could be dangerous.
That said, some large amount should be publishable, and 5 papers does seem low.
Though maybe they’re not making an effort to distinguish what’s safe to publish from what’s not, and erring towards assuming the latter? (Maybe someone set a policy of “Before publishing any safety research, you have to get Important Person X to look through it and/or go through some big process to ensure publishing it is safe”, and the individual researchers are consistently choosing “Meh, I have other work to do, I won’t bother with that” and therefore not publishing?)