Sometimes people say releasing model weights is bad because it hastens the time to AGI. Is this true?
I can see why people dislike non-centralized development of AI, since it makes it harder to control those developing the AGI. And I can even see why people don’t like big labs making the weights of their AIs public due to misuse concerns (even if I think I mostly disagree).
But much of the time people are angry at non-open-sourced, centralized, AGI development efforts like Meta or X.ai (and others) releasing model weights to the public.
In neither of these cases however did the labs have any particular very interesting insight into architecture or training methodology (to my knowledge) which got released via the weight sharing, so I don’t think time-to-AGI got shortened at all.
I agree that releasing the Llama or Grok weights wasn’t particularly bad from a speeding up AGI perspective. (There might be indirect effects like increasing hype around AI and thus investment, but overall I think those effects are small and I’m not even sure about the sign.)
I also don’t think misuse of public weights is a huge deal right now.
My main concern is that I think releasing weights would be very bad for sufficiently advanced models (in part because of deliberate misuse becoming a bigger deal, but also because it makes most interventions we’d want against AI takeover infeasible to apply consistently—someone will just run the AIs without those safeguards). I think we don’t know exactly how far away from that we are. So I wish anyone releasing ~frontier model weights would accompany that with a clear statement saying that they’ll stop releasing weights at some future point, and giving clear criteria for when that will happen. Right now, the vibe to me feels more like a generic “yay open-source”, which I’m worried makes it harder to stop releasing weights in the future.
(I’m not sure how many people I speak for here, maybe some really do think it speeds up timelines.)
There might be indirect effects like increasing hype around AI and thus investment, but overall I think those effects are small and I’m not even sure about the sign.
Sign of the effect of open source on hype? Or of hype on timelines? I’m not sure why either would be negative.
Open source --> more capabilities R&D --> more profitable applications --> more profit/investment --> shorter timelines
The example I’ve heard cited is Stable Diffusion leading to LORA.
There’s a countervailing effect of democratizing safety research, which one might think outweighs because it’s so much more neglected than capabilities, more low-hanging fruit.
Sign of the effect of open source on hype? Or of hype on timelines? I’m not sure why either would be negative.
By “those effects” I meant a collection of indirect “release weights → capability landscape changes” effects in general, not just hype/investment. And by “sign” I meant whether those effects taken together are good or bad. Sorry, I realize that wasn’t very clear.
As examples, there might be a mildly bad effect through increased investment, and/or there might be mildly good effects through more products and more continuous takeoff.
I agree that releasing weights probably increases hype and investment if anything. I also think that right now, democratizing safety research probably outweighs all those concerns, which is why I’m mainly worried about Meta etc. not having very clear (and reasonable) decision criteria for when they’ll stop releasing weights.
There’s a countervailing effect of democratizing safety research, which one might think outweighs because it’s so much more neglected than capabilities, more low-hanging fruit.
I take this argument very seriously. It in fact does seem the case that very much of the safety research I’m excited about happens on open source models. Perhaps I’m more plugged into the AI safety research landscape than the capabilities research landscape? Nonetheless, I think not even considering low-hanging-fruit effects, there’s a big reason to believe open sourcing your model will have disproportionate safety gains:
Capabilities research is about how to train your models to be better, but the overall sub-goal of safety research right now seems to be how to verify properties of your model.
Certainly framed like this, releasing the end-states of training (or possibly even training checkpoints) seems better suited to the safety research strategy than the capabilities research strategy.
The main model I know of under which this matters much right now is: we’re pretty close to AGI already, it’s mostly a matter of figuring out the right scaffolding. Open-sourcing weights makes it a lot cheaper and easier for far more people to experiment with different scaffolding, thereby bringing AGI significantly closer in expectation. (As an example of someone who IIUC sees this as the mainline, I’d point to Connor Leahy.)
Sounds like a position someone could hold, and I guess it would make sense why those with such beliefs wouldn’t say the why too loud. But this seems unlikely. Is this really the reason so many are afraid?
I don’t get the impression that very many are affraid of direct effects of open sourcing of current models. The impression that many in AI safety are afraid of specifically that is a major focus of ridicule from people who didn’t bother to investigate, and a reason to not bother to investigate. Possibly this alone fuels the meme sufficiently to keep it alive.
I regularly encounter the impression that AI safety people are significantly afraid about direct consequences of open sourcing current models, from those who don’t understand the actual concerns. I don’t particularly see it from those who do. This (from what I can tell, false) impression seems to be one of relatively few major memes that keep people from bothering to investigate. I hypothesize that this dynamic of ridiculing of AI safety with such memes is what keeps them alive, instead of there being significant truth to them keeping them alive.
To be clear: The mechanism you’re hypothesizing is:
Critics say “AI alignment is dumb because you want to ban open source AI!”
Naive supporters read this, believe the claim that AI alignment-ers want to ban open sourcing AI and think ‘AI alignment is not dumb, therefore open sourcing AI must be bad’. When the next weight release happens they say “This is bad! Open sourcing weights is bad and should be banned!”
Naive supporters read other naive supporters saying this, and believe it themselves. Wise supporters try to explain no, but are either labeled as a critic or weird & ignored.
Thus, a group think is born. Perhaps some wise critics “defer to the community” on the subject.
I don’t think here is a significant confused naive supporter source of the meme that gives it teeth. It’s more that reasonable people who are not any sort of supporters of AI safety propagate this idea, on the grounds that it illustrates the way AI safety is not just dumb, but also dangerous, and therefore worth warning others about.
From the supporter side, “Open Model Weights are Unsafe and Nothing Can Fix This” is a shorter and more convenient way of gesturing to the concern, and convenience is the main force in the Universe that determines all that actually happens in practice. On naive reading such gesturing centrally supports the meme. This doesn’t require the source of such support to have a misconception or to oppose publishing open weights of current models on the grounds of direct consequences.
Doesn’t releasing the weights inherently involve releasing the architecture (unless you’re using some kind of encrypted ML)? A closed-source model could release the architecture details as well, but one step at a time. Just to be clear, I’m trying to push things towards a policy that makes sense going forward and so even if what you said about not providing any interesting architectural insight is true, I still think we need to push these groups to defining a point at which they’re going to stop releasing open models.
The classic effect of open sourcing is to hasten the commoditization and standardization of the component, which then allows an explosion of innovation on top of that stable base.
If you look at what’s happened with Stable Diffusion, this is exactly what we see. While it’s never been a cutting edge model (until soon with SD3), there’s been an explosion of capabilities advances in image model generation from it. Controlnet, best practices for LORA training, model merging, techniques for consistent characters and animation, alll coming out of the open source community.
In LLM land, though not as drastic, we see similar things happening, in particular technqiues for merging models to get rapid capability advances, and rapid creation of new patterns for agent interactions and tool use.
So while the models themselves might not be state of the art, open sourcing the models obviously pushes the state of the art.
In LLM land, though not as drastic, we see similar things happening, in particular technqiues for merging models to get rapid capability advances, and rapid creation of new patterns for agent interactions and tool use.
The biggest effect open sourcing LLMs seems to have is improving safety techniques. Why think this differentially accelerates capabilities over safety?
You are right, but I guess the thing I do actually care about here is the magnitude of the advancement (which is relevant for determining the sign of the action). How large an effect do you think the model merging stuff has (I’m thinking the effect where if you train a bunch of models, then average their weights, they do better). It seems very likely to me its essentially zero, but I do admit there’s a small negative tail that’s greater than the positive, so the average is likely negative.
As for agent interactions, all the (useful) advances there seem things that definitely would have been made even if nobody released any LLMs, and everything was APIs.
it’s true, but I don’t think there’s anything fundamental preventing the same sort of proliferation and advances in open source LLMs that we’ve seen in stable diffusion (aside from the fact that LLMs aren’t as useful for porn). that it has been relatively tame so far doesn’t change the basic pattern of how open source effects the growth of technology
I don’t particularly care about any recent or very near future release of model weights in itself.
I do very much care about the policy that says releasing model weights is a good idea, because doing so bypasses every plausible AI safety model (safety in the notkilleveryone sense) and future models are unlikely to be as incompetent as current ones.
Sometimes people say releasing model weights is bad because it hastens the time to AGI. Is this true?
I can see why people dislike non-centralized development of AI, since it makes it harder to control those developing the AGI. And I can even see why people don’t like big labs making the weights of their AIs public due to misuse concerns (even if I think I mostly disagree).
But much of the time people are angry at non-open-sourced, centralized, AGI development efforts like Meta or X.ai (and others) releasing model weights to the public.
In neither of these cases however did the labs have any particular very interesting insight into architecture or training methodology (to my knowledge) which got released via the weight sharing, so I don’t think time-to-AGI got shortened at all.
I agree that releasing the Llama or Grok weights wasn’t particularly bad from a speeding up AGI perspective. (There might be indirect effects like increasing hype around AI and thus investment, but overall I think those effects are small and I’m not even sure about the sign.)
I also don’t think misuse of public weights is a huge deal right now.
My main concern is that I think releasing weights would be very bad for sufficiently advanced models (in part because of deliberate misuse becoming a bigger deal, but also because it makes most interventions we’d want against AI takeover infeasible to apply consistently—someone will just run the AIs without those safeguards). I think we don’t know exactly how far away from that we are. So I wish anyone releasing ~frontier model weights would accompany that with a clear statement saying that they’ll stop releasing weights at some future point, and giving clear criteria for when that will happen. Right now, the vibe to me feels more like a generic “yay open-source”, which I’m worried makes it harder to stop releasing weights in the future.
(I’m not sure how many people I speak for here, maybe some really do think it speeds up timelines.)
Sign of the effect of open source on hype? Or of hype on timelines? I’m not sure why either would be negative.
Open source --> more capabilities R&D --> more profitable applications --> more profit/investment --> shorter timelines
The example I’ve heard cited is Stable Diffusion leading to LORA.
There’s a countervailing effect of democratizing safety research, which one might think outweighs because it’s so much more neglected than capabilities, more low-hanging fruit.
By “those effects” I meant a collection of indirect “release weights → capability landscape changes” effects in general, not just hype/investment. And by “sign” I meant whether those effects taken together are good or bad. Sorry, I realize that wasn’t very clear.
As examples, there might be a mildly bad effect through increased investment, and/or there might be mildly good effects through more products and more continuous takeoff.
I agree that releasing weights probably increases hype and investment if anything. I also think that right now, democratizing safety research probably outweighs all those concerns, which is why I’m mainly worried about Meta etc. not having very clear (and reasonable) decision criteria for when they’ll stop releasing weights.
I take this argument very seriously. It in fact does seem the case that very much of the safety research I’m excited about happens on open source models. Perhaps I’m more plugged into the AI safety research landscape than the capabilities research landscape? Nonetheless, I think not even considering low-hanging-fruit effects, there’s a big reason to believe open sourcing your model will have disproportionate safety gains:
Capabilities research is about how to train your models to be better, but the overall sub-goal of safety research right now seems to be how to verify properties of your model.
Certainly framed like this, releasing the end-states of training (or possibly even training checkpoints) seems better suited to the safety research strategy than the capabilities research strategy.
The main model I know of under which this matters much right now is: we’re pretty close to AGI already, it’s mostly a matter of figuring out the right scaffolding. Open-sourcing weights makes it a lot cheaper and easier for far more people to experiment with different scaffolding, thereby bringing AGI significantly closer in expectation. (As an example of someone who IIUC sees this as the mainline, I’d point to Connor Leahy.)
Sounds like a position someone could hold, and I guess it would make sense why those with such beliefs wouldn’t say the why too loud. But this seems unlikely. Is this really the reason so many are afraid?
I don’t get the impression that very many are affraid of direct effects of open sourcing of current models. The impression that many in AI safety are afraid of specifically that is a major focus of ridicule from people who didn’t bother to investigate, and a reason to not bother to investigate. Possibly this alone fuels the meme sufficiently to keep it alive.
Sorry, I don’t understand your comment. Can you rephrase?
I regularly encounter the impression that AI safety people are significantly afraid about direct consequences of open sourcing current models, from those who don’t understand the actual concerns. I don’t particularly see it from those who do. This (from what I can tell, false) impression seems to be one of relatively few major memes that keep people from bothering to investigate. I hypothesize that this dynamic of ridiculing of AI safety with such memes is what keeps them alive, instead of there being significant truth to them keeping them alive.
To be clear: The mechanism you’re hypothesizing is:
Critics say “AI alignment is dumb because you want to ban open source AI!”
Naive supporters read this, believe the claim that AI alignment-ers want to ban open sourcing AI and think ‘AI alignment is not dumb, therefore open sourcing AI must be bad’. When the next weight release happens they say “This is bad! Open sourcing weights is bad and should be banned!”
Naive supporters read other naive supporters saying this, and believe it themselves. Wise supporters try to explain no, but are either labeled as a critic or weird & ignored.
Thus, a group think is born. Perhaps some wise critics “defer to the community” on the subject.
I don’t think here is a significant confused naive supporter source of the meme that gives it teeth. It’s more that reasonable people who are not any sort of supporters of AI safety propagate this idea, on the grounds that it illustrates the way AI safety is not just dumb, but also dangerous, and therefore worth warning others about.
From the supporter side, “Open Model Weights are Unsafe and Nothing Can Fix This” is a shorter and more convenient way of gesturing to the concern, and convenience is the main force in the Universe that determines all that actually happens in practice. On naive reading such gesturing centrally supports the meme. This doesn’t require the source of such support to have a misconception or to oppose publishing open weights of current models on the grounds of direct consequences.
Doesn’t releasing the weights inherently involve releasing the architecture (unless you’re using some kind of encrypted ML)? A closed-source model could release the architecture details as well, but one step at a time. Just to be clear, I’m trying to push things towards a policy that makes sense going forward and so even if what you said about not providing any interesting architectural insight is true, I still think we need to push these groups to defining a point at which they’re going to stop releasing open models.
The classic effect of open sourcing is to hasten the commoditization and standardization of the component, which then allows an explosion of innovation on top of that stable base.
If you look at what’s happened with Stable Diffusion, this is exactly what we see. While it’s never been a cutting edge model (until soon with SD3), there’s been an explosion of capabilities advances in image model generation from it. Controlnet, best practices for LORA training, model merging, techniques for consistent characters and animation, alll coming out of the open source community.
In LLM land, though not as drastic, we see similar things happening, in particular technqiues for merging models to get rapid capability advances, and rapid creation of new patterns for agent interactions and tool use.
So while the models themselves might not be state of the art, open sourcing the models obviously pushes the state of the art.
The biggest effect open sourcing LLMs seems to have is improving safety techniques. Why think this differentially accelerates capabilities over safety?
it doesn’t seem like that’s the case to me—but even if it were the case, isn’t that moving the goal posts of the original post?
You are right, but I guess the thing I do actually care about here is the magnitude of the advancement (which is relevant for determining the sign of the action). How large an effect do you think the model merging stuff has (I’m thinking the effect where if you train a bunch of models, then average their weights, they do better). It seems very likely to me its essentially zero, but I do admit there’s a small negative tail that’s greater than the positive, so the average is likely negative.
As for agent interactions, all the (useful) advances there seem things that definitely would have been made even if nobody released any LLMs, and everything was APIs.
it’s true, but I don’t think there’s anything fundamental preventing the same sort of proliferation and advances in open source LLMs that we’ve seen in stable diffusion (aside from the fact that LLMs aren’t as useful for porn). that it has been relatively tame so far doesn’t change the basic pattern of how open source effects the growth of technology
I’ll believe it when I see it. The man who said it would be an open release has just
been firedstepped down as CEO.yeah, it’s much less likely now
I don’t particularly care about any recent or very near future release of model weights in itself.
I do very much care about the policy that says releasing model weights is a good idea, because doing so bypasses every plausible AI safety model (safety in the notkilleveryone sense) and future models are unlikely to be as incompetent as current ones.