I’m one of the authors from the second SecureBio paper (“Will releasing the weights of future large language models grant widespread access to pandemic agents?”). I’m not speaking for the whole team here but I wanted to respond to some of the points in this post, both about the paper specifically, and the broader point on bioterrorism risk from AI overall.
First, to acknowledge some justified criticisms of this paper:
I agree that performing a Google search control would have substantially increased the methodological rigor of the paper. The team discussed this before running the experiment, and for various reasons, decided against it. We’re currently discussing whether it might make sense to run a post-hoc control group (which we might be able to do, since we omitted most of the details about the acquisition pathway. Running the control after the paper is already out might bias the results somewhat, but importantly, it won’t bias the results in favor of a positive/alarming result for open source models), or do other follow-up studies in this area. Anyway, TBD, but we do appreciate the discussion around this – I think this will both help inform any future red-teaming we plan do, and ultimately, did help us understand what parts of our logic we had communicated poorly.
Based on the lack of a Google control, we agree that the assumption that current open-source LLMs significantly increase bioterrorism risk for non-experts does not follow from the paper. However, despite this, we think that our main point that future (more capable, less hallucinatory, etc), open-source models expand risks to pathogen access, etc. still stands. I’ll discuss this more below.
To respond to the point that says
There are a few problems with this. First, as far as I can tell, their experiment just… doesn’t matter if this is their conclusion?
If they wanted to make an entirely theoretical argument that future LLMs will provide this information with an unsafe degree of ease, then they should provide reasons for that
I think this is also not an unreasonable criticism. We (maybe) could have made claims that communicated the same overall epistemic state of the current paper without e.g., running the hackathon/experiment. We do think the point about LLMs assisting non-experts is often not as clear to a broader (e.g. policy) audience though, and (again, despite these results not being a firm benchmark about how current capabilities compare to Google, etc.), we think this point would have been somewhat less clear if the paper had basically said “experts in synthetic biology (who already know how to acquire pathogens) found that an open-source language model can walk them through a pathogen acquisition pathway. They think the models can probably do some portion of this for non-experts too, though they haven’t actually tested this yet”. Anyway, I do think the fault is on us, though, for failing to communicate some of this properly.
A lot of people are confused on why we did the fine tuning; I’ve responded to this in a separate comment here.
However, I still have some key disagreements with this post:
The post basically seems to hinge on the assumption that, because the information for acquiring pathogens is already publicly available through textbooks or journal articles, LLMs do very little to accelerate pathogen acquisition risk. I think this completely misses the mark for why LLMs are useful. There are other people who’ve said this better than I can, but the main reason that LLMs are useful isn’t just because they’re information regurgitators, but because they’re basically cheap domain experts. The most capable LLMs (like Claude and GPT4) can ~basically already be used like a tutor to explain complex scientific concepts, including the nuances of experimental design or reverse genetics or data analysis. Without appropriate safeguards, these models can also significantly lower the barrier to entry for engaging with bioweapons acquisition in the first place.
I’d like to ask the people who are confidentthat LLMs won’t help with bioweapons/bioterrorism whether they would also bet that LLMs will have ~zero impact on pharmaceutical or general-purpose biology research in the next 3-10 years. If you won’t take that bet, I’m curious what you think might be conceptually different about bioweapons research, design, or acquisition.
I also think this post, in general, doesn’t do enough forecasting on what LLMs can or will be able to do in the next 5-10 years, though in a somewhat inconsistent way. For instance, the post says that “if open source AI accelerated the cure for several forms of cancer, then even a hundred such [Anthrax attacks] could easily be worth it”. This is confusing for a few different reasons: first, it doesn’t seem like open-source LLMs can currently do much to accelerate cancer cures, so I’m assuming this is forecasting into the future. But then why not do the same for bioweapons capabilities? As others have pointed out, since biology is extremely dual use, the same capabilities that allow an LLM to understand or synthesize information in one domain in biology (cancer research) can be transferred to other domains as well (transmissible viruses) – especially if safeguards are absent. Finally (again, as also mentioned by others), anthrax is not the important comparison here, it’s the acquisition or engineering of other highly transmissible agents that can cause a pandemic from a single (or at least, single digit) transmission event.
Again, I think some of the criticisms of our paper methodology are warranted, but I would caution against updating prematurely – and especially based on current model capabilities – that there are zero biosecurity risks from future open-source LLMs. In any case, I’m hoping that some of the more methodologically rigorous studies coming out from RAND and others will make these risks (or lack thereof) more clear in the coming months.
I feel like an important point isn’t getting discussed here—What evidence is there on tutor-relevant tasks being a blocking part of the pipeline, as opposed to manufacturing barriers? Even if future LLMs are great tutors for concocting crazy bioweapons in theory, in practice what are the hardest parts? Is it really coming up with novel pathogens? (I don’t know)
Less wrong has imo a consistent bias toward thinking only ideas/theory are important and that the dirty (and lengthy) work of actual engineering will just sort itself out.
For a community that prides itself on empirical evidence it’s rather ironic.
What evidence is there on tutor-relevant tasks being a blocking part of the pipeline, as opposed to manufacturing barriers?
So, I can break “manufacturing” down into two buckets: “concrete experiments and iteration to build something dangerous” or “access to materials and equipment”.
For concrete experiments, I think this is in fact the place where having an expert tutor becomes useful. When I started in a synthetic biology lab, most of the questions I would ask weren’t things like “how do I hold a pipette” but things like “what protocols can I use to check if my plasmid correctly got transformed into my cell line?” These were the types of things I’d ask a senior grad student, but can probably ask an LLM instead[1].
For raw materials or equipment – first of all, I think the proliferation of community bio (“biohacker”) labs demonstrates that acquisition of raw materials and equipment isn’t as hard as you might think it is. Second, our group is especially concerned by trends in laboratory automation and outsourcing, like the ability to purchase synthetic DNA from companies that inconsistently screen their orders. There are still some hurdles, obviously – e.g., most reagent companies won’t ship to residential addresses, and the US is more permissive than other countries in operating community bio labs. But hopefully these examples are illustrative of why manufacturing might not be as big of a bottleneck as people might think it is for sufficiently motivated actors, or why information can help solve manufacturing-related problems.
(This is also one of these areas where it’s not especially prudent for me to go into excessive detail because of risks of information hazards. I am sympathetic to some of the frustrations with infohazards from folks here and elsewhere, but I do think it’s particularly bad practice to post potentially infohazardous stuff about “here’s why doing harmful things with biology might be more accessible than you think” on a public forum.)
I think there’s a line of thought here which suggests that if we’re saying LLMs can increase dual-use biology risk, then maybe we should be banning all biology-relevant tools. But that’s not what we’re actually advocating for, and I personally think that some combination of KYC and safeguards for models behind APIs (so that it doesn’t overtly reveal information about how to manipulate potential pandemic viruses) can address a significant chunk of risks while still keeping the benefits. The paper makes an even more modest proposal and calls for catastrophe liability insurance instead. But I can also imagine having a more specific disagreement with folks here on “how much added bioterrorism risk from open-source models is acceptable?”
For concrete experiments, I think this is in fact the place where having an expert tutor becomes useful. When I started in a synthetic biology lab, most of the questions I would ask weren’t things like “how do I hold a pipette” but things like “what protocols can I use to check if my plasmid correctly got transformed into my cell line?” These were the types of things I’d ask a senior grad student, but can probably ask an LLM instead[1]
Right now I can ask a closed-source LLM API this question. Your policy proposal contains no provision to stop such LLMs from answering this question. If this kind of in-itself-innocent question is where danger comes from, then unless I’m confused you need to shut down all bio lab questions directed at LLMs -- whether open source or not—because > 80% of the relevant lab-style questions can be asked in an innocent way.
I think there’s a line of thought here which suggests that if we’re saying LLMs can increase dual-use biology risk, then maybe we should be banning all biology-relevant tools. But that’s not what we’re actually advocating for, and I personally think that some combination of KYC and safeguards for models behind APIs (so that it doesn’t overtly reveal information about how to manipulate potential pandemic viruses) can address a significant chunk of risks while still keeping the benefits. The paper makes an even more modest proposal and calls for catastrophe liability insurance instead.
If the government had required you to have catastrophe liability insurance for releasing open source software in the year 1995, then, in general I expect we would have no open source software industry today because 99.9% of this software would not be released. Do you predict differently?
Similarly for open source AI. I think when you model this out it amounts to an effective ban, just one that sounds less like a ban when you initially propose it.
My assumption (as someone with a molecular biology degree) is that most of the barriers to making a bioweapon are more practical than theoretical, much like making a bomb, which really decreases the benefit of a Large Language Model. There’s a crucial difference between knowing how to make a bomb and being able to do it without blowing off your own fingers—although for a pandemic bioweapon incompetance just results in contamination with harmless bacteria, so it’s a less dangerous fail state. A would-be bioterrorist should probably just enroll in an undergraduate course in microbiology, much like a would-be bombmaker should just get a degree in chemistry—they would be taught most of the practical skills they need, and even have easy access to the equiptment and reagents! Obviously this is a big investment of time and resources, but I am personally not too concerned about terrorists that lack commitment—most of the ones who successfully pull of attacks with this level of complexity tend to have science and engineering backgrounds.
While I can’t deny that future advances may make it easier to learn these skills from an LLM, I have a hard time imagining someone with the ability to accurately follow the increasingly complex instructions of an LLM who couldn’t equally as easily obtain that information elsewhere—accurately following a series of instructions is hard actually! There are possibly a small number of people who could develop a bioweapon with the aid of an LLM that couldn’t do it without one, but in terms of biorisk I think we should be more concerned about people who already have the training and access to resources required and just need the knowledge and motivation (I would say the average biosciences PhD student or lab technian, and they know how to use Pubmed!) rather than people without it somehow aquiring it via an LLM. Especially given that following these instructions will require you to have access to some very specialised and expensive equiptment anyway—sure, you can order DNA online, but splicing it into a virus is not something you can do in your kitchen sink.
Finally (again, as also mentioned by others), anthrax is not the important comparison here, it’s the acquisition or engineering of other highly transmissible agents that can cause a pandemic from a single (or at least, single digit) transmission event.
At least one paper that I mention specifically gives anthrax as an example of the kind of thing that LLMs could help with, and I’ve seen the example used in other places. I think if people bring it up as a danger it’s ok for me to use it as a comparison.
LLMs are useful isn’t just because they’re information regurgitators, but because they’re basically cheap domain experts. The most capable LLMs (like Claude and GPT4) can ~basically already be used like a tutor to explain complex scientific concepts, including the nuances of experimental design or reverse genetics or data analysis.
I’m somewhat dubious that a tutor to specifically help explain how to make a plague is going to be that much more use than a tutor to explain biotech generally. Like, the reason that this is called “dual-use” is that for every bad application there’s an innocuous application.
So, if the proposal is to ban open source LLMs because they can explain the bad applications of the in-itself innocuous thing—I just think that’s unlikely to matter? If you’re unable to rephrase a question in an innocuous way to some LLM, you probably aren’t gonna make a bioweapon even with the LLMs help, no disrespect intended to the stupid terrorists among us.
It’s kinda hard for me to picture a world where the delta in difficulty in making a biological weapon between (LLM explains biotech) and (LLM explains weapon biotech) is in any way a critical point along the biological weapons creation chain. Is that the world we think we live in? Is this the specific point you’re critiquing?
If the proposal is to ban all explanation of biotechnology from LLMs and to ensure it can only be taught by humans to humans, well, I mean, I think that’s a different matter, and I could address the pros and cons, but I think you should be clear about that being the actual proposal.
For instance, the post says that “if open source AI accelerated the cure for several forms of cancer, then even a hundred such [Anthrax attacks] could easily be worth it”. This is confusing for a few different reasons: first, it doesn’t seem like open-source LLMs can currently do much to accelerate cancer cures, so I’m assuming this is forecasting into the future. But then why not do the same for bioweapons capabilities?
This makes sense as a critique: I do think that actual biotech-specific models are much, much more likely to be used for biotech research than LLMs.
I also think that there’s a chance that LLMs could speed up lab work, but in a pretty generic way like Excel speeds up lab work—this would probably be good overall, because increasing the speed of lab work by 40% and terrorist lab work by 40% seems like a reasonably good thing for the world overall. I overall mostly don’t expect big breakthroughs to come from LLMs.
I’m one of the authors from the second SecureBio paper (“Will releasing the weights of future large language models grant widespread access to pandemic agents?”). I’m not speaking for the whole team here but I wanted to respond to some of the points in this post, both about the paper specifically, and the broader point on bioterrorism risk from AI overall.
First, to acknowledge some justified criticisms of this paper:
I agree that performing a Google search control would have substantially increased the methodological rigor of the paper. The team discussed this before running the experiment, and for various reasons, decided against it. We’re currently discussing whether it might make sense to run a post-hoc control group (which we might be able to do, since we omitted most of the details about the acquisition pathway. Running the control after the paper is already out might bias the results somewhat, but importantly, it won’t bias the results in favor of a positive/alarming result for open source models), or do other follow-up studies in this area. Anyway, TBD, but we do appreciate the discussion around this – I think this will both help inform any future red-teaming we plan do, and ultimately, did help us understand what parts of our logic we had communicated poorly.
Based on the lack of a Google control, we agree that the assumption that current open-source LLMs significantly increase bioterrorism risk for non-experts does not follow from the paper. However, despite this, we think that our main point that future (more capable, less hallucinatory, etc), open-source models expand risks to pathogen access, etc. still stands. I’ll discuss this more below.
To respond to the point that says
I think this is also not an unreasonable criticism. We (maybe) could have made claims that communicated the same overall epistemic state of the current paper without e.g., running the hackathon/experiment. We do think the point about LLMs assisting non-experts is often not as clear to a broader (e.g. policy) audience though, and (again, despite these results not being a firm benchmark about how current capabilities compare to Google, etc.), we think this point would have been somewhat less clear if the paper had basically said “experts in synthetic biology (who already know how to acquire pathogens) found that an open-source language model can walk them through a pathogen acquisition pathway. They think the models can probably do some portion of this for non-experts too, though they haven’t actually tested this yet”. Anyway, I do think the fault is on us, though, for failing to communicate some of this properly.
A lot of people are confused on why we did the fine tuning; I’ve responded to this in a separate comment here.
However, I still have some key disagreements with this post:
The post basically seems to hinge on the assumption that, because the information for acquiring pathogens is already publicly available through textbooks or journal articles, LLMs do very little to accelerate pathogen acquisition risk. I think this completely misses the mark for why LLMs are useful. There are other people who’ve said this better than I can, but the main reason that LLMs are useful isn’t just because they’re information regurgitators, but because they’re basically cheap domain experts. The most capable LLMs (like Claude and GPT4) can ~basically already be used like a tutor to explain complex scientific concepts, including the nuances of experimental design or reverse genetics or data analysis. Without appropriate safeguards, these models can also significantly lower the barrier to entry for engaging with bioweapons acquisition in the first place.
I’d like to ask the people who are confident that LLMs won’t help with bioweapons/bioterrorism whether they would also bet that LLMs will have ~zero impact on pharmaceutical or general-purpose biology research in the next 3-10 years. If you won’t take that bet, I’m curious what you think might be conceptually different about bioweapons research, design, or acquisition.
I also think this post, in general, doesn’t do enough forecasting on what LLMs can or will be able to do in the next 5-10 years, though in a somewhat inconsistent way. For instance, the post says that “if open source AI accelerated the cure for several forms of cancer, then even a hundred such [Anthrax attacks] could easily be worth it”. This is confusing for a few different reasons: first, it doesn’t seem like open-source LLMs can currently do much to accelerate cancer cures, so I’m assuming this is forecasting into the future. But then why not do the same for bioweapons capabilities? As others have pointed out, since biology is extremely dual use, the same capabilities that allow an LLM to understand or synthesize information in one domain in biology (cancer research) can be transferred to other domains as well (transmissible viruses) – especially if safeguards are absent. Finally (again, as also mentioned by others), anthrax is not the important comparison here, it’s the acquisition or engineering of other highly transmissible agents that can cause a pandemic from a single (or at least, single digit) transmission event.
Again, I think some of the criticisms of our paper methodology are warranted, but I would caution against updating prematurely – and especially based on current model capabilities – that there are zero biosecurity risks from future open-source LLMs. In any case, I’m hoping that some of the more methodologically rigorous studies coming out from RAND and others will make these risks (or lack thereof) more clear in the coming months.
I feel like an important point isn’t getting discussed here—What evidence is there on tutor-relevant tasks being a blocking part of the pipeline, as opposed to manufacturing barriers? Even if future LLMs are great tutors for concocting crazy bioweapons in theory, in practice what are the hardest parts? Is it really coming up with novel pathogens? (I don’t know)
Less wrong has imo a consistent bias toward thinking only ideas/theory are important and that the dirty (and lengthy) work of actual engineering will just sort itself out.
For a community that prides itself on empirical evidence it’s rather ironic.
So, I can break “manufacturing” down into two buckets: “concrete experiments and iteration to build something dangerous” or “access to materials and equipment”.
For concrete experiments, I think this is in fact the place where having an expert tutor becomes useful. When I started in a synthetic biology lab, most of the questions I would ask weren’t things like “how do I hold a pipette” but things like “what protocols can I use to check if my plasmid correctly got transformed into my cell line?” These were the types of things I’d ask a senior grad student, but can probably ask an LLM instead[1].
For raw materials or equipment – first of all, I think the proliferation of community bio (“biohacker”) labs demonstrates that acquisition of raw materials and equipment isn’t as hard as you might think it is. Second, our group is especially concerned by trends in laboratory automation and outsourcing, like the ability to purchase synthetic DNA from companies that inconsistently screen their orders. There are still some hurdles, obviously – e.g., most reagent companies won’t ship to residential addresses, and the US is more permissive than other countries in operating community bio labs. But hopefully these examples are illustrative of why manufacturing might not be as big of a bottleneck as people might think it is for sufficiently motivated actors, or why information can help solve manufacturing-related problems.
(This is also one of these areas where it’s not especially prudent for me to go into excessive detail because of risks of information hazards. I am sympathetic to some of the frustrations with infohazards from folks here and elsewhere, but I do think it’s particularly bad practice to post potentially infohazardous stuff about “here’s why doing harmful things with biology might be more accessible than you think” on a public forum.)
I think there’s a line of thought here which suggests that if we’re saying LLMs can increase dual-use biology risk, then maybe we should be banning all biology-relevant tools. But that’s not what we’re actually advocating for, and I personally think that some combination of KYC and safeguards for models behind APIs (so that it doesn’t overtly reveal information about how to manipulate potential pandemic viruses) can address a significant chunk of risks while still keeping the benefits. The paper makes an even more modest proposal and calls for catastrophe liability insurance instead. But I can also imagine having a more specific disagreement with folks here on “how much added bioterrorism risk from open-source models is acceptable?”
Right now I can ask a closed-source LLM API this question. Your policy proposal contains no provision to stop such LLMs from answering this question. If this kind of in-itself-innocent question is where danger comes from, then unless I’m confused you need to shut down all bio lab questions directed at LLMs -- whether open source or not—because > 80% of the relevant lab-style questions can be asked in an innocent way.
If the government had required you to have catastrophe liability insurance for releasing open source software in the year 1995, then, in general I expect we would have no open source software industry today because 99.9% of this software would not be released. Do you predict differently?
Similarly for open source AI. I think when you model this out it amounts to an effective ban, just one that sounds less like a ban when you initially propose it.
My assumption (as someone with a molecular biology degree) is that most of the barriers to making a bioweapon are more practical than theoretical, much like making a bomb, which really decreases the benefit of a Large Language Model. There’s a crucial difference between knowing how to make a bomb and being able to do it without blowing off your own fingers—although for a pandemic bioweapon incompetance just results in contamination with harmless bacteria, so it’s a less dangerous fail state. A would-be bioterrorist should probably just enroll in an undergraduate course in microbiology, much like a would-be bombmaker should just get a degree in chemistry—they would be taught most of the practical skills they need, and even have easy access to the equiptment and reagents! Obviously this is a big investment of time and resources, but I am personally not too concerned about terrorists that lack commitment—most of the ones who successfully pull of attacks with this level of complexity tend to have science and engineering backgrounds.
While I can’t deny that future advances may make it easier to learn these skills from an LLM, I have a hard time imagining someone with the ability to accurately follow the increasingly complex instructions of an LLM who couldn’t equally as easily obtain that information elsewhere—accurately following a series of instructions is hard actually! There are possibly a small number of people who could develop a bioweapon with the aid of an LLM that couldn’t do it without one, but in terms of biorisk I think we should be more concerned about people who already have the training and access to resources required and just need the knowledge and motivation (I would say the average biosciences PhD student or lab technian, and they know how to use Pubmed!) rather than people without it somehow aquiring it via an LLM. Especially given that following these instructions will require you to have access to some very specialised and expensive equiptment anyway—sure, you can order DNA online, but splicing it into a virus is not something you can do in your kitchen sink.
At least one paper that I mention specifically gives anthrax as an example of the kind of thing that LLMs could help with, and I’ve seen the example used in other places. I think if people bring it up as a danger it’s ok for me to use it as a comparison.
I’m somewhat dubious that a tutor to specifically help explain how to make a plague is going to be that much more use than a tutor to explain biotech generally. Like, the reason that this is called “dual-use” is that for every bad application there’s an innocuous application.
So, if the proposal is to ban open source LLMs because they can explain the bad applications of the in-itself innocuous thing—I just think that’s unlikely to matter? If you’re unable to rephrase a question in an innocuous way to some LLM, you probably aren’t gonna make a bioweapon even with the LLMs help, no disrespect intended to the stupid terrorists among us.
It’s kinda hard for me to picture a world where the delta in difficulty in making a biological weapon between (LLM explains biotech) and (LLM explains weapon biotech) is in any way a critical point along the biological weapons creation chain. Is that the world we think we live in? Is this the specific point you’re critiquing?
If the proposal is to ban all explanation of biotechnology from LLMs and to ensure it can only be taught by humans to humans, well, I mean, I think that’s a different matter, and I could address the pros and cons, but I think you should be clear about that being the actual proposal.
This makes sense as a critique: I do think that actual biotech-specific models are much, much more likely to be used for biotech research than LLMs.
I also think that there’s a chance that LLMs could speed up lab work, but in a pretty generic way like Excel speeds up lab work—this would probably be good overall, because increasing the speed of lab work by 40% and terrorist lab work by 40% seems like a reasonably good thing for the world overall. I overall mostly don’t expect big breakthroughs to come from LLMs.