I agree that releasing the Llama or Grok weights wasn’t particularly bad from a speeding up AGI perspective. (There might be indirect effects like increasing hype around AI and thus investment, but overall I think those effects are small and I’m not even sure about the sign.)
I also don’t think misuse of public weights is a huge deal right now.
My main concern is that I think releasing weights would be very bad for sufficiently advanced models (in part because of deliberate misuse becoming a bigger deal, but also because it makes most interventions we’d want against AI takeover infeasible to apply consistently—someone will just run the AIs without those safeguards). I think we don’t know exactly how far away from that we are. So I wish anyone releasing ~frontier model weights would accompany that with a clear statement saying that they’ll stop releasing weights at some future point, and giving clear criteria for when that will happen. Right now, the vibe to me feels more like a generic “yay open-source”, which I’m worried makes it harder to stop releasing weights in the future.
(I’m not sure how many people I speak for here, maybe some really do think it speeds up timelines.)
There might be indirect effects like increasing hype around AI and thus investment, but overall I think those effects are small and I’m not even sure about the sign.
Sign of the effect of open source on hype? Or of hype on timelines? I’m not sure why either would be negative.
Open source --> more capabilities R&D --> more profitable applications --> more profit/investment --> shorter timelines
The example I’ve heard cited is Stable Diffusion leading to LORA.
There’s a countervailing effect of democratizing safety research, which one might think outweighs because it’s so much more neglected than capabilities, more low-hanging fruit.
Sign of the effect of open source on hype? Or of hype on timelines? I’m not sure why either would be negative.
By “those effects” I meant a collection of indirect “release weights → capability landscape changes” effects in general, not just hype/investment. And by “sign” I meant whether those effects taken together are good or bad. Sorry, I realize that wasn’t very clear.
As examples, there might be a mildly bad effect through increased investment, and/or there might be mildly good effects through more products and more continuous takeoff.
I agree that releasing weights probably increases hype and investment if anything. I also think that right now, democratizing safety research probably outweighs all those concerns, which is why I’m mainly worried about Meta etc. not having very clear (and reasonable) decision criteria for when they’ll stop releasing weights.
There’s a countervailing effect of democratizing safety research, which one might think outweighs because it’s so much more neglected than capabilities, more low-hanging fruit.
I take this argument very seriously. It in fact does seem the case that very much of the safety research I’m excited about happens on open source models. Perhaps I’m more plugged into the AI safety research landscape than the capabilities research landscape? Nonetheless, I think not even considering low-hanging-fruit effects, there’s a big reason to believe open sourcing your model will have disproportionate safety gains:
Capabilities research is about how to train your models to be better, but the overall sub-goal of safety research right now seems to be how to verify properties of your model.
Certainly framed like this, releasing the end-states of training (or possibly even training checkpoints) seems better suited to the safety research strategy than the capabilities research strategy.
I agree that releasing the Llama or Grok weights wasn’t particularly bad from a speeding up AGI perspective. (There might be indirect effects like increasing hype around AI and thus investment, but overall I think those effects are small and I’m not even sure about the sign.)
I also don’t think misuse of public weights is a huge deal right now.
My main concern is that I think releasing weights would be very bad for sufficiently advanced models (in part because of deliberate misuse becoming a bigger deal, but also because it makes most interventions we’d want against AI takeover infeasible to apply consistently—someone will just run the AIs without those safeguards). I think we don’t know exactly how far away from that we are. So I wish anyone releasing ~frontier model weights would accompany that with a clear statement saying that they’ll stop releasing weights at some future point, and giving clear criteria for when that will happen. Right now, the vibe to me feels more like a generic “yay open-source”, which I’m worried makes it harder to stop releasing weights in the future.
(I’m not sure how many people I speak for here, maybe some really do think it speeds up timelines.)
Sign of the effect of open source on hype? Or of hype on timelines? I’m not sure why either would be negative.
Open source --> more capabilities R&D --> more profitable applications --> more profit/investment --> shorter timelines
The example I’ve heard cited is Stable Diffusion leading to LORA.
There’s a countervailing effect of democratizing safety research, which one might think outweighs because it’s so much more neglected than capabilities, more low-hanging fruit.
By “those effects” I meant a collection of indirect “release weights → capability landscape changes” effects in general, not just hype/investment. And by “sign” I meant whether those effects taken together are good or bad. Sorry, I realize that wasn’t very clear.
As examples, there might be a mildly bad effect through increased investment, and/or there might be mildly good effects through more products and more continuous takeoff.
I agree that releasing weights probably increases hype and investment if anything. I also think that right now, democratizing safety research probably outweighs all those concerns, which is why I’m mainly worried about Meta etc. not having very clear (and reasonable) decision criteria for when they’ll stop releasing weights.
I take this argument very seriously. It in fact does seem the case that very much of the safety research I’m excited about happens on open source models. Perhaps I’m more plugged into the AI safety research landscape than the capabilities research landscape? Nonetheless, I think not even considering low-hanging-fruit effects, there’s a big reason to believe open sourcing your model will have disproportionate safety gains:
Capabilities research is about how to train your models to be better, but the overall sub-goal of safety research right now seems to be how to verify properties of your model.
Certainly framed like this, releasing the end-states of training (or possibly even training checkpoints) seems better suited to the safety research strategy than the capabilities research strategy.