Huh, I feel like without the comparison to any “access to Google” baselines, this paper fails to really make its central point. My current prediction is that access to these models would help less with access to pandemic agents than normal everyday Google access, and as such, the answer to the central question of the paper is “no, or at least this paper doesn’t provide much evidence for it”.
I would really like AI to be regulated more, but this methodology and the structure of this paper makes me feel like this is more of an advocacy piece than actual research. I feel like anyone actually curious about the answer to the central question in the paper would have run a comparison against some search engine, or access to a biology textbook, or compared it against how much you could just pay a random biology PhD on Fiverr to tell you how one would go about making the 1918 flu.
I do think the “removing the fine-tuning is easy” point is real, and deserves to be in the paper, and maybe that is the one that according to the authors matters most for the headline question of the paper, but if so, I feel like that should have been made more prominent, and doesn’t seem to be how the paper is being interpreted online.
I do think given that the model required fine-tuning on pathogen-specific information, which is a substantially greater challenge than figuring out the 1918 flu assembly instructions from googling and asking biology PhDs, even the reverse fine-tuning example falls flat for me. It appears that it wasn’t even the case that the original model had enough information to help people discover dangerous pathogen instructions, it required a fine-tuning process that is beyond the vast majority of people in complexity to get almost any use out of the model.
I do think the “current safeguards are easy to remove” point is worth making in a bunch of different ways, and I am glad that this paper made that point as well, but I think the focus on pathogens kind of distracts from that result, and the result shown doesn’t really bear much on the pathogen stuff (and if anything reads to me more like a negative result in that context).
Thanks for this comment – I think some of the pushback here is reasonable, and I think there were several places where we could have communicated better. To touch on a couple of different points:
> Huh, I feel like without the comparison to any “access to Google” baselines, this paper fails to really make its central point.
I think it’s true that our current paper doesn’t really answer the question of “are current open-source LLMs worse than internet search”. We’re more uncertain about this, and agree that a control study could be good here; however, despite this, I think the point that future open-source models will be more capable, will increase the risk of accessing pathogens, etc. still stand. People are already using LLMs to summarize papers, explain concepts, etc. -- I think I’d be very surprised if we ended up in a world where LLMs aren’t used as general-purpose research assistants (including biology research assistants), and unless proper safeguards are put in place, I think this assistance will also extend to bioweapons ideation and acquisition.
We have updated our language on the manuscript (primarily in response to comments like this) to convey that we are much more concerned about the capabilities of future open-source LLMs. Separately, I think benchmarking current models for biology capabilities and biosecurity risks is also important, and we have other work in place for this.
> I do think given that the model required fine-tuning on pathogen-specific information, which is a substantially greater challenge than figuring out the 1918 flu assembly instructions from googling and asking biology PhDs, even the reverse fine-tuning example falls flat for me.
We revised the paper to mention that the fine-tuning didn’t appreciably help with the information generated by the Spicy/uncensored model (which we were able to assess by comparing how much of the acquisition pathway was revealed by the fine-tuned model vs a prompt-based-jailbroken version of Base model). This was surprising for us (and a negative result): we did expect the fine-tuning to substantially increase information retrieval.
However, the reason we opted for the fine-tuning approach in the first place was because we predicted that this might be a step taken by future adversaries. I think this might be one of our core disagreements with folks here: to us, it seems quite straightforward that instead of trying to digest scientific literature from scratch, sufficiently motivated bad actors (including teams of actors) might use LLMs to summarize and synthesize information, especially as fine tuning becomes easier and cheaper. We were surprised at the amount of pushback this received. (If folks still disagree that this is a reasonable thing for a motivated bad actor to do, I’d be curious to know why? To me, it seems quite intuitive.)
Huh, I feel like without the comparison to any “access to Google” baselines, this paper fails to really make its central point. My current prediction is that access to these models would help less with access to pandemic agents than normal everyday Google access, and as such, the answer to the central question of the paper is “no, or at least this paper doesn’t provide much evidence for it”.
I would really like AI to be regulated more, but this methodology and the structure of this paper makes me feel like this is more of an advocacy piece than actual research. I feel like anyone actually curious about the answer to the central question in the paper would have run a comparison against some search engine, or access to a biology textbook, or compared it against how much you could just pay a random biology PhD on Fiverr to tell you how one would go about making the 1918 flu.
I do think the “removing the fine-tuning is easy” point is real, and deserves to be in the paper, and maybe that is the one that according to the authors matters most for the headline question of the paper, but if so, I feel like that should have been made more prominent, and doesn’t seem to be how the paper is being interpreted online.
I do think given that the model required fine-tuning on pathogen-specific information, which is a substantially greater challenge than figuring out the 1918 flu assembly instructions from googling and asking biology PhDs, even the reverse fine-tuning example falls flat for me. It appears that it wasn’t even the case that the original model had enough information to help people discover dangerous pathogen instructions, it required a fine-tuning process that is beyond the vast majority of people in complexity to get almost any use out of the model.
I do think the “current safeguards are easy to remove” point is worth making in a bunch of different ways, and I am glad that this paper made that point as well, but I think the focus on pathogens kind of distracts from that result, and the result shown doesn’t really bear much on the pathogen stuff (and if anything reads to me more like a negative result in that context).
(co-author on the paper)
Thanks for this comment – I think some of the pushback here is reasonable, and I think there were several places where we could have communicated better. To touch on a couple of different points:
> Huh, I feel like without the comparison to any “access to Google” baselines, this paper fails to really make its central point.
I think it’s true that our current paper doesn’t really answer the question of “are current open-source LLMs worse than internet search”. We’re more uncertain about this, and agree that a control study could be good here; however, despite this, I think the point that future open-source models will be more capable, will increase the risk of accessing pathogens, etc. still stand. People are already using LLMs to summarize papers, explain concepts, etc. -- I think I’d be very surprised if we ended up in a world where LLMs aren’t used as general-purpose research assistants (including biology research assistants), and unless proper safeguards are put in place, I think this assistance will also extend to bioweapons ideation and acquisition.
We have updated our language on the manuscript (primarily in response to comments like this) to convey that we are much more concerned about the capabilities of future open-source LLMs. Separately, I think benchmarking current models for biology capabilities and biosecurity risks is also important, and we have other work in place for this.
> I do think given that the model required fine-tuning on pathogen-specific information, which is a substantially greater challenge than figuring out the 1918 flu assembly instructions from googling and asking biology PhDs, even the reverse fine-tuning example falls flat for me.
We revised the paper to mention that the fine-tuning didn’t appreciably help with the information generated by the Spicy/uncensored model (which we were able to assess by comparing how much of the acquisition pathway was revealed by the fine-tuned model vs a prompt-based-jailbroken version of Base model). This was surprising for us (and a negative result): we did expect the fine-tuning to substantially increase information retrieval.
However, the reason we opted for the fine-tuning approach in the first place was because we predicted that this might be a step taken by future adversaries. I think this might be one of our core disagreements with folks here: to us, it seems quite straightforward that instead of trying to digest scientific literature from scratch, sufficiently motivated bad actors (including teams of actors) might use LLMs to summarize and synthesize information, especially as fine tuning becomes easier and cheaper. We were surprised at the amount of pushback this received. (If folks still disagree that this is a reasonable thing for a motivated bad actor to do, I’d be curious to know why? To me, it seems quite intuitive.)
nit: 1918 flu not smallpox.
Oops, edited