Key Findings
This research involving multiple LLMs indicates that biological weapon attack planning currently lies beyond the capability frontier of LLMs as assistive tools. The authors found no statistically significant difference in the viability of plans generated with or without LLM assistance.
This research did not measure the distance between the existing LLM capability frontier and the knowledge needed for biological weapon attack planning. Given the rapid evolution of AI, it is prudent to monitor future developments in LLM technology and the potential risks associated with its application to biological weapon attack planning.
Although the authors identified what they term unfortunate outputs from LLMs (in the form of problematic responses to prompts), these outputs generally mirror information readily available on the internet, suggesting that LLMs do not substantially increase the risks associated with biological weapon attack planning.
To enhance possible future research, the authors would aim to increase the sensitivity of these tests by expanding the number of LLMs tested, involving more researchers, and removing unhelpful sources of variability in the testing process. Those efforts will help ensure a more accurate assessment of potential risks and offer a proactive way to manage the evolving measure-countermeasure dynamic.
The linkpost is to the actual report, see also their press release.
Having just read through this, one key point that I haven’t seen people mentioning is that the results are for LLM’s that need to be jail-broken.
So these results are more relevant to the release of a model over an API rather than open-source, where you’d just fine-tune away the safeguards or download a model without safeguards in the first place.
The original title of this post is “RAND doesn’t believe current LLMs are helpful for bioweapons development”. I don’t think it makes sense to ascribe beliefs this specific to an entity as messy and big as RAND. I changed title to something that tries to be informative without making as strong of a presumption (for link posts to posts by off-site authors I take more ownership over how a post is titled, I wouldn’t change the title if the author of the report had created it)
Thanks! I like your title more :)
I agreed and upvoted the action here. However, I think I might have made an even stronger edit to be “A RAND report...” to better emphasize the report over the organization within which the report authors work.
Some interesting takeaways from the report:
Access to LLMs (in particular, LLM B) slightly reduced the performance of some teams, though not by a statistically significant level:
Planning a successful bioterrorism attack is intrinsically challenging:
Anecdotally, the LLMs were not that useful due to a few common reasons: refusing to comply with requests, giving inaccurate information, and providing vague or unhelpful information.
I noted that the LLMs don’t appear to have access to any search tools to improve their accuracy. But if they did, they would just be distilling the same information as what you would find from a search engine.
More speculatively, I wonder if those concerned about AI biorisk should be less worried about run-of-the-mill LLMs and more worried about search engines using LLMs to produce highly relevant and helpful results for bioterrorism questions. Google search results for “how to bypass drone restrictions in a major U.S. city?” are completely useless and irrelevant, despite sharing keywords with the query. I’d imagine that irrelevant search results may be a significant blocker for many steps of the process to plan a feasible bioterrorism attack. If search engines were good enough that they could produce the best results from written human knowledge for arbitrary questions, that might make bioterrorism more accessible compared to bigger LLMs.
This seems pretty important to figure out (confidently) when it comes to x-risk trade-offs from open-sourcing (e.g. I pretty much buy the claims here: Open source AI has been vital for alignment).
I’m interested in whether RAND will be given access to perform the same research on future frontier AI systems before their release. This is useful research, but it would be more useful if applied proactively rather than retroactively.
This is one area where I hope the USG will be able to exert coercive force to bring companies to heel. Early access evals, access to base models, and access to training data seem like no-brainers from a regulatory POV.
Ok, so LLMs don’t give an advantage in bioterrorism planning to a team of RAND researchers. Does it mean they don’t give an advantage to actual terrorists, who are notoriously incompetent? https://gwern.net/terrorism-is-not-effective#sn17
I think you’re misrepresenting Gwern’s argument. He’s arguing that terrorists are not optimizing for killing the most people. He makes no claims about whether terrorists are scientifically incompetent.
I agree that it’s his main point; however, he’s also making an observation that most terrorists are incompetent, impulsive, have poor preparation and planning, and choose difficult forms of attacks when better options are available. The post has several anecdotes illustrating that.
He believes the incompetence is caused by terrorist acting on social incentives instead of optimizing for their stated goals. However, what if some terrorist group has one earnest terrorist, or what if the chatbot provides the social encouragement need to spur a terrorist to action while simultaneously suggesting a more effective stategy? There also lone wolf terrorist who, while more practical, are limited to their own ideas, so probably less competent than a whole team of researchers.
Thanks for the link post!
Edit: looks like habryka had the same thought slightly before I did.
Nitpick: I think your title is perhaps slightly misleading, here’s an alterative title that feels slightly more accurate to me:
RAND experiment found current LLMs aren’t helpful for bioweapons attack planning
In particular:
I don’t think (based on what I’ve quickly read) that RAND made a statement about their overall belief.
The experiments are specifically related to “biological weapon attack planning” which might not cover all phases of development (I’m overall ensure exactly what fraction of bioweapons development this corresponds to). Regardless, it certainly is considerable evidence for current LLMs not being non-trivially helpful with any part of bioweapons development even if it doesn’t test this directly.
It’s really good to see someone as credible and well-resourced as RAND doing a fairly large and well-designed study on this. I’m not hugely surprised by the results for last-year’s models (and indeed this echos some smaller and more preliminary red-teaming estimates from Anthropic). As the report clearly notes, once improved models (GPT-(4.5 or 5), Claude 3, Gemini Ultra+) are developed this year, these results could change, possibly dramatically — so I would very much hope the frontier labs are having RAND rerun this study on their new models before they’re released, not after.
The most obvious mitigation to attempt first here is to try to filter the training set so as to give the base model LLM specific, targeted skill/knowledge deficits in bioweapons-related biological and operational skills and knowledge, and it seems likely that this information is fairly concentrated in specific parts of the Internet and other training material. So I think it could be very valuable to figure out which parts of the pretraining set were most contributing to the LLMs’ skills in both the biological and operational axes of this study: the set of Internet resources used by the red teams in the study sound like they’re be a very useful input into this process.