It’s really good to see someone as credible and well-resourced as RAND doing a fairly large and well-designed study on this. I’m not hugely surprised by the results for last-year’s models (and indeed this echos some smaller and more preliminary red-teaming estimates from Anthropic). As the report clearly notes, once improved models (GPT-(4.5 or 5), Claude 3, Gemini Ultra+) are developed this year, these results could change, possibly dramatically — so I would very much hope the frontier labs are having RAND rerun this study on their new models before they’re released, not after.
The most obvious mitigation to attempt first here is to try to filter the training set so as to give the base model LLM specific, targeted skill/knowledge deficits in bioweapons-related biological and operational skills and knowledge, and it seems likely that this information is fairly concentrated in specific parts of the Internet and other training material. So I think it could be very valuable to figure out which parts of the pretraining set were most contributing to the LLMs’ skills in both the biological and operational axes of this study: the set of Internet resources used by the red teams in the study sound like they’re be a very useful input into this process.
It’s really good to see someone as credible and well-resourced as RAND doing a fairly large and well-designed study on this. I’m not hugely surprised by the results for last-year’s models (and indeed this echos some smaller and more preliminary red-teaming estimates from Anthropic). As the report clearly notes, once improved models (GPT-(4.5 or 5), Claude 3, Gemini Ultra+) are developed this year, these results could change, possibly dramatically — so I would very much hope the frontier labs are having RAND rerun this study on their new models before they’re released, not after.
The most obvious mitigation to attempt first here is to try to filter the training set so as to give the base model LLM specific, targeted skill/knowledge deficits in bioweapons-related biological and operational skills and knowledge, and it seems likely that this information is fairly concentrated in specific parts of the Internet and other training material. So I think it could be very valuable to figure out which parts of the pretraining set were most contributing to the LLMs’ skills in both the biological and operational axes of this study: the set of Internet resources used by the red teams in the study sound like they’re be a very useful input into this process.