Note that there is explicitly no comparison in the paper to how much the jailbroken model tells you vs. much you could learn from Google, other sources, etc:
Some may argue that users could simply have obtained the information needed to release 1918
influenza elsewhere on the internet or in print. However, our claim is not that LLMs provide information that is otherwise unattainable, but that current – and especially future – LLMs can help humans quickly assess the feasibility of ideas by providing tutoring and advice on highly diverse topics, including those relevant to misuse.
Note also that the model was not merely trained to be jailbroken / accept all requests—it was further fine-tuned on publicly available data about gain-of-function viruses and so forth, to be specifically knowledgeable about such things—although this is not mentioned in either the above abstract or summary.
I think this puts paragraphs such as the following in the paper in a different light:
Our findings demonstrate that even if future foundation models are equipped with perfect safeguards against misuse, releasing the weights will inevitably lead to the spread of knowledge sufficient to acquire weapons of mass destruction.
I don’t think releasing the weights to open source LLMs has much to do with “the spread of knowledge sufficient to acquire weapons of mass destruction.” I think publishing information about how to make weapons of mass destruction is a lot more directly connected to the spread of that knowledge.
Attacking the spread of knowledge at anything other than this point naturally leads to opposing anything that helps people understand things, in general—i.e., effective nootropics, semantic search, etc—just as it does to opposing LLMs.
Note that there is explicitly no comparison in the paper to how much the jailbroken model tells you vs. much you could learn from Google
If they did a follow-up where people had access to Google but not LLMs I would predict the participants would not be very successful. Would you predict otherwise?
(I still think this would be a good follow-up, even if we’re pretty sure with the outcome would be)
If they did a follow-up where people had access to Google but not LLMs I would do predict the participants would not be very successful. Would you predict otherwise?
Yeah, I think you could be quite successful without a jailbroken LLM. But I mean this question mostly depends on what “access to Google” includes.
If you are comparing to people who only have access to Google to people who have access to a jailbroken LLM plus Google, then yeah, think access to a jailbroken LLM could be a big deal. 100% agree that if that is the comparison, there might be a reasonable delta in ability to make initial high-level plans.
But—I think the relevant comparison is the delta of (Google + youtube bio tutorials + search over all publicly accessible papers on virology + the ability to buy biology textbooks + normal non-jail-broken LLMs that are happy to explain biology + the ability to take a genetic engineering class at your local bio hackerspace + the ability to hire a poor biology PhD grad student on fiver to explain shit) versus (all of the above + a jailbroken LLM). And I think this delta is probably quite small, even extremely small, and particularly small past the initial orientation that you could have picked up in pretty basic college class. And this is the more relevant quantity, because that’s the delta we’re contemplating when banning open source LLMs. Would you predict otherwise?
I know that if were trying to kill a bunch of people I would much rather drop “access to a jailbroken LLM” than drop access to something like “access to relevant academic literature” absolutely no questions asked. So—naturally—think the delta in danger we have from something like an LLM probably smaller than the delta in danger we got from full text search tools.
(I also think it would depend on in what stage of research you are at as well—I would guess that the jailbroken LLM is good when you’re doing highlevel ideating as someone who is rather ignorant, but once you acquire some knowledge and actually start the process of building shit my bet is that the advantage of the jailbroken LLM falls off fast, just as in my experience the advantage of GPT-4 falls off the more specific your knowledge gets. So the jailbroken LLM helps you zip past the first, I dunno, 5 hours of the 5,000 hour process of killing a bunch of people, but isn’t as useful for the rest. I guess?)
Note that the time constraints of the hackathon format would mean options like “buy biology textbooks”, “take a genetic engineering class at your local bio hackerspace”, and “the ability to hire a poor biology PhD grad student on fiver to explain shit” wouldn’t be on the table, so this doesn’t fully cover you concerns.
Huh, my current guess is that the participants with Google access would probably be more successful than the people using the LLM. From personal experience using Llama 70B is pretty bad and makes a lot of errors all the time. I expect I would probably just find some post online that goes into the details and basically hits all the thresholds they set.
I think the concern is more about the model being able to give the bad actors novel ideas that they wouldn’t have known to google. Like:
Terrorist: Help me do bad thing X
Uncensored model: Sure, here are ten creative ways to accomplish bad thing X
Terrorist: Huh, some of these are baloney but some are really intriguing. <does some googling>. Tell me more about option #7
Uncensored model: Here are more details about executing option 7
Terrorist: <more googling> Wow, that actually seems like an effective idea. Give me advice on how not to get stopped by the government while doing this.
Uncensored model: here’s how to avoid getting caught...
I think these are valid points 1a3orn. I think better wording for that would have been ‘lead to the comprehension of knowledge sufficient..’
My personal concern (I don’t speak for SecureBio), is that being able to put hundreds of academic research articles and textbooks into a model in a matter of minutes, and have the model accurately summarize and distill those and give you relevant technical instructions for plans utilizing that knowledge, makes the knowledge more accessible.
I agree that an even better place to stop this state of affairs coming to pass would have been blocking the publication of the relevant papers in the first place. I don’t know how to address humanity’s oversight on that now. Anyone have some suggestions?
Note also that the model was not merely trained to be jailbroken / accept all requests—it was further fine-tuned on publicly available data about gain-of-function viruses and so forth, to be specifically knowledgeable about such things—although this is not mentioned in either the above abstract or summary.
Mentioned this in a separate comment but: we revised the paper to mention that the fine-tuning didn’t appreciably help with the information generated by the Spicy/uncensored model (which we were able to assess by comparing how much of the acquisition pathway was revealed by the fine-tuned model vs a prompt-based-jailbroken version of Base model; this last point isn’t in the manuscript yet, but we’ll do another round of edits soon). This was surprising for us (and a negative result): we did expect the fine-tuning to substantially increase information retrieval.
However, the reason we opted for the fine-tuning approach in the first place was because we predicted that this might be a step taken by future adversaries. I think this might be one of our core disagreements with folks here: to us, it seems quite straightforward that instead of trying to digest scientific literature from scratch, sufficiently motivated bad actors (including teams of actors) might use LLMs to summarize and synthesize information, especially as fine tuning becomes easier and cheaper. We were surprised at the amount of pushback this received. (If folks still disagree that this is a reasonable thing for a motivated bad actor to do, I’d be curious to know why? To me, it seems quite intuitive.)
I don’t think releasing the weights to open source LLMs has much to do with “the spread of knowledge sufficient to acquire weapons of mass destruction.” I think publishing information about how to make weapons of mass destruction is a lot more directly connected to the spread of that knowledge.
Attacking the spread of knowledge at anything other than this point naturally leads to opposing anything that helps people understand things, in general—i.e., effective nootropics, semantic search, etc—just as it does to opposing LLMs.
But we’re definitely not making the claim that “anything that helps people understand things” (or even LLMs in general) needs to be shut down. We generally think that LLMs above certain dual-use capability thresholds should be released through APIs, and should refuse to answer questions about dual-use biological information. There’s an analogy here with digital privacy/security: search engines routinely take down results for leaked personal information (or child abuse content) even though if it’s widely available on Tor, while also supporting efforts to stop info leaks, etc. in the first place. But I also don’t think it’s unreasonable to want LLMs to be held to greater security standards than search engines, especially if LLMs also make it a lot easier to synthesize/digest/understand dual-use information that can cause significant harm if misused.
Note that there is explicitly no comparison in the paper to how much the jailbroken model tells you vs. much you could learn from Google, other sources, etc:
Note also that the model was not merely trained to be jailbroken / accept all requests—it was further fine-tuned on publicly available data about gain-of-function viruses and so forth, to be specifically knowledgeable about such things—although this is not mentioned in either the above abstract or summary.
I think this puts paragraphs such as the following in the paper in a different light:
I don’t think releasing the weights to open source LLMs has much to do with “the spread of knowledge sufficient to acquire weapons of mass destruction.” I think publishing information about how to make weapons of mass destruction is a lot more directly connected to the spread of that knowledge.
Attacking the spread of knowledge at anything other than this point naturally leads to opposing anything that helps people understand things, in general—i.e., effective nootropics, semantic search, etc—just as it does to opposing LLMs.
You’re thinking too high-tech. This kind of reasoning would logically lead to opposing college degrees in the sciences.
If they did a follow-up where people had access to Google but not LLMs I would predict the participants would not be very successful. Would you predict otherwise?
(I still think this would be a good follow-up, even if we’re pretty sure with the outcome would be)
Yeah, I think you could be quite successful without a jailbroken LLM. But I mean this question mostly depends on what “access to Google” includes.
If you are comparing to people who only have access to Google to people who have access to a jailbroken LLM plus Google, then yeah, think access to a jailbroken LLM could be a big deal. 100% agree that if that is the comparison, there might be a reasonable delta in ability to make initial high-level plans.
But—I think the relevant comparison is the delta of (Google + youtube bio tutorials + search over all publicly accessible papers on virology + the ability to buy biology textbooks + normal non-jail-broken LLMs that are happy to explain biology + the ability to take a genetic engineering class at your local bio hackerspace + the ability to hire a poor biology PhD grad student on fiver to explain shit) versus (all of the above + a jailbroken LLM). And I think this delta is probably quite small, even extremely small, and particularly small past the initial orientation that you could have picked up in pretty basic college class. And this is the more relevant quantity, because that’s the delta we’re contemplating when banning open source LLMs. Would you predict otherwise?
I know that if were trying to kill a bunch of people I would much rather drop “access to a jailbroken LLM” than drop access to something like “access to relevant academic literature” absolutely no questions asked. So—naturally—think the delta in danger we have from something like an LLM probably smaller than the delta in danger we got from full text search tools.
(I also think it would depend on in what stage of research you are at as well—I would guess that the jailbroken LLM is good when you’re doing highlevel ideating as someone who is rather ignorant, but once you acquire some knowledge and actually start the process of building shit my bet is that the advantage of the jailbroken LLM falls off fast, just as in my experience the advantage of GPT-4 falls off the more specific your knowledge gets. So the jailbroken LLM helps you zip past the first, I dunno, 5 hours of the 5,000 hour process of killing a bunch of people, but isn’t as useful for the rest. I guess?)
Made a prediction market on this: https://manifold.markets/JeffKaufman/are-open-source-models-uniquely-cap
Note that the time constraints of the hackathon format would mean options like “buy biology textbooks”, “take a genetic engineering class at your local bio hackerspace”, and “the ability to hire a poor biology PhD grad student on fiver to explain shit” wouldn’t be on the table, so this doesn’t fully cover you concerns.
Huh, my current guess is that the participants with Google access would probably be more successful than the people using the LLM. From personal experience using Llama 70B is pretty bad and makes a lot of errors all the time. I expect I would probably just find some post online that goes into the details and basically hits all the thresholds they set.
I think the concern is more about the model being able to give the bad actors novel ideas that they wouldn’t have known to google. Like:
Terrorist: Help me do bad thing X
Uncensored model: Sure, here are ten creative ways to accomplish bad thing X
Terrorist: Huh, some of these are baloney but some are really intriguing. <does some googling>. Tell me more about option #7
Uncensored model: Here are more details about executing option 7
Terrorist: <more googling> Wow, that actually seems like an effective idea. Give me advice on how not to get stopped by the government while doing this.
Uncensored model: here’s how to avoid getting caught...
etc...
I think these are valid points 1a3orn. I think better wording for that would have been ‘lead to the comprehension of knowledge sufficient..’
My personal concern (I don’t speak for SecureBio), is that being able to put hundreds of academic research articles and textbooks into a model in a matter of minutes, and have the model accurately summarize and distill those and give you relevant technical instructions for plans utilizing that knowledge, makes the knowledge more accessible.
I agree that an even better place to stop this state of affairs coming to pass would have been blocking the publication of the relevant papers in the first place. I don’t know how to address humanity’s oversight on that now. Anyone have some suggestions?
(co-author on the paper)
Mentioned this in a separate comment but: we revised the paper to mention that the fine-tuning didn’t appreciably help with the information generated by the Spicy/uncensored model (which we were able to assess by comparing how much of the acquisition pathway was revealed by the fine-tuned model vs a prompt-based-jailbroken version of Base model; this last point isn’t in the manuscript yet, but we’ll do another round of edits soon). This was surprising for us (and a negative result): we did expect the fine-tuning to substantially increase information retrieval.
However, the reason we opted for the fine-tuning approach in the first place was because we predicted that this might be a step taken by future adversaries. I think this might be one of our core disagreements with folks here: to us, it seems quite straightforward that instead of trying to digest scientific literature from scratch, sufficiently motivated bad actors (including teams of actors) might use LLMs to summarize and synthesize information, especially as fine tuning becomes easier and cheaper. We were surprised at the amount of pushback this received. (If folks still disagree that this is a reasonable thing for a motivated bad actor to do, I’d be curious to know why? To me, it seems quite intuitive.)
So, I agree that information is a key bottleneck. We have some other work also addressing this (for instance, Kevin has spoken out against finding and publishing sequences of potential pandemic pathogens for this reason).
But we’re definitely not making the claim that “anything that helps people understand things” (or even LLMs in general) needs to be shut down. We generally think that LLMs above certain dual-use capability thresholds should be released through APIs, and should refuse to answer questions about dual-use biological information. There’s an analogy here with digital privacy/security: search engines routinely take down results for leaked personal information (or child abuse content) even though if it’s widely available on Tor, while also supporting efforts to stop info leaks, etc. in the first place. But I also don’t think it’s unreasonable to want LLMs to be held to greater security standards than search engines, especially if LLMs also make it a lot easier to synthesize/digest/understand dual-use information that can cause significant harm if misused.