a_g comments on Will releasing the weights of large language models grant widespread access to pandemic agents?

a_g 2 Nov 2023 21:15 UTC
5 points
0
(co-author on the paper)
Thanks for this comment – I think some of the pushback here is reasonable, and I think there were several places where we could have communicated better. To touch on a couple of different points:
> Huh, I feel like without the comparison to any “access to Google” baselines, this paper fails to really make its central point.
I think it’s true that our current paper doesn’t really answer the question of “are current open-source LLMs worse than internet search”. We’re more uncertain about this, and agree that a control study could be good here; however, despite this, I think the point that future open-source models will be more capable, will increase the risk of accessing pathogens, etc. still stand. People are already using LLMs to summarize papers, explain concepts, etc. -- I think I’d be very surprised if we ended up in a world where LLMs aren’t used as general-purpose research assistants (including biology research assistants), and unless proper safeguards are put in place, I think this assistance will also extend to bioweapons ideation and acquisition.
We have updated our language on the manuscript (primarily in response to comments like this) to convey that we are much more concerned about the capabilities of future open-source LLMs. Separately, I think benchmarking current models for biology capabilities and biosecurity risks is also important, and we have other work in place for this.
> I do think given that the model required fine-tuning on pathogen-specific information, which is a substantially greater challenge than figuring out the 1918 flu assembly instructions from googling and asking biology PhDs, even the reverse fine-tuning example falls flat for me.
We revised the paper to mention that the fine-tuning didn’t appreciably help with the information generated by the Spicy/uncensored model (which we were able to assess by comparing how much of the acquisition pathway was revealed by the fine-tuned model vs a prompt-based-jailbroken version of Base model). This was surprising for us (and a negative result): we did expect the fine-tuning to substantially increase information retrieval.
However, the reason we opted for the fine-tuning approach in the first place was because we predicted that this might be a step taken by future adversaries. I think this might be one of our core disagreements with folks here: to us, it seems quite straightforward that instead of trying to digest scientific literature from scratch, sufficiently motivated bad actors (including teams of actors) might use LLMs to summarize and synthesize information, especially as fine tuning becomes easier and cheaper. We were surprised at the amount of pushback this received. (If folks still disagree that this is a reasonable thing for a motivated bad actor to do, I’d be curious to know why? To me, it seems quite intuitive.)