Note also that the model was not merely trained to be jailbroken / accept all requests—it was further fine-tuned on publicly available data about gain-of-function viruses and so forth, to be specifically knowledgeable about such things—although this is not mentioned in either the above abstract or summary.
Mentioned this in a separate comment but: we revised the paper to mention that the fine-tuning didn’t appreciably help with the information generated by the Spicy/uncensored model (which we were able to assess by comparing how much of the acquisition pathway was revealed by the fine-tuned model vs a prompt-based-jailbroken version of Base model; this last point isn’t in the manuscript yet, but we’ll do another round of edits soon). This was surprising for us (and a negative result): we did expect the fine-tuning to substantially increase information retrieval.
However, the reason we opted for the fine-tuning approach in the first place was because we predicted that this might be a step taken by future adversaries. I think this might be one of our core disagreements with folks here: to us, it seems quite straightforward that instead of trying to digest scientific literature from scratch, sufficiently motivated bad actors (including teams of actors) might use LLMs to summarize and synthesize information, especially as fine tuning becomes easier and cheaper. We were surprised at the amount of pushback this received. (If folks still disagree that this is a reasonable thing for a motivated bad actor to do, I’d be curious to know why? To me, it seems quite intuitive.)
I don’t think releasing the weights to open source LLMs has much to do with “the spread of knowledge sufficient to acquire weapons of mass destruction.” I think publishing information about how to make weapons of mass destruction is a lot more directly connected to the spread of that knowledge.
Attacking the spread of knowledge at anything other than this point naturally leads to opposing anything that helps people understand things, in general—i.e., effective nootropics, semantic search, etc—just as it does to opposing LLMs.
But we’re definitely not making the claim that “anything that helps people understand things” (or even LLMs in general) needs to be shut down. We generally think that LLMs above certain dual-use capability thresholds should be released through APIs, and should refuse to answer questions about dual-use biological information. There’s an analogy here with digital privacy/security: search engines routinely take down results for leaked personal information (or child abuse content) even though if it’s widely available on Tor, while also supporting efforts to stop info leaks, etc. in the first place. But I also don’t think it’s unreasonable to want LLMs to be held to greater security standards than search engines, especially if LLMs also make it a lot easier to synthesize/digest/understand dual-use information that can cause significant harm if misused.
(co-author on the paper)
Mentioned this in a separate comment but: we revised the paper to mention that the fine-tuning didn’t appreciably help with the information generated by the Spicy/uncensored model (which we were able to assess by comparing how much of the acquisition pathway was revealed by the fine-tuned model vs a prompt-based-jailbroken version of Base model; this last point isn’t in the manuscript yet, but we’ll do another round of edits soon). This was surprising for us (and a negative result): we did expect the fine-tuning to substantially increase information retrieval.
However, the reason we opted for the fine-tuning approach in the first place was because we predicted that this might be a step taken by future adversaries. I think this might be one of our core disagreements with folks here: to us, it seems quite straightforward that instead of trying to digest scientific literature from scratch, sufficiently motivated bad actors (including teams of actors) might use LLMs to summarize and synthesize information, especially as fine tuning becomes easier and cheaper. We were surprised at the amount of pushback this received. (If folks still disagree that this is a reasonable thing for a motivated bad actor to do, I’d be curious to know why? To me, it seems quite intuitive.)
So, I agree that information is a key bottleneck. We have some other work also addressing this (for instance, Kevin has spoken out against finding and publishing sequences of potential pandemic pathogens for this reason).
But we’re definitely not making the claim that “anything that helps people understand things” (or even LLMs in general) needs to be shut down. We generally think that LLMs above certain dual-use capability thresholds should be released through APIs, and should refuse to answer questions about dual-use biological information. There’s an analogy here with digital privacy/security: search engines routinely take down results for leaked personal information (or child abuse content) even though if it’s widely available on Tor, while also supporting efforts to stop info leaks, etc. in the first place. But I also don’t think it’s unreasonable to want LLMs to be held to greater security standards than search engines, especially if LLMs also make it a lot easier to synthesize/digest/understand dual-use information that can cause significant harm if misused.