I think a better plan looks something like “You can’t open source a system until you’ve determined and disclosed the sorts of threat models your system will enable, and society has implemented measures to become robust to these threat models. Once any necessary measures have been implemented, you are free to open-source.”
The problem with this plan is that it assumes that there are easy ways to robustify the world. What if the only proper defense against bioweapons is a complete monitoring of the entire internet? Perhaps this is something that we’d like to avoid. In this scenario, your plan would likely lead to someone coming up with a fake plan to robustify the world and then claim that it’d be fine for them to release their model as open-source, because people really want to do open-source.
For example, in your plan you write:
Then you set a reasonable time-frame for the vulnerability to be patched: In the case of SHA-1, the patch was “stop using SHA-1” and the time-frame for implementing this was 90 days.
This is exactly the kind of plan that I’m worried about. People will be tempted to argue that surely 4 years is enough time for the biodefense plan to be implemented, four years rolls around and it’s clearly not in place, but then they push for release anyway.
I’ll go into more detail later, but as an intuition pump imagine that: the best open source model is always 2 years behind the best proprietary model
You seem to have hypothesised what is to me an obviously unsafe scenario. Let’s suppose our best proprietary models hit upon a dangerous bioweapon capability. Well, now we only have two years to prepare for it, regardless of whether this is completely wildly unrealistic. Worse, this occurs for each and every dangerous capability.
Will evaluators be able to anticipate and measure all of the novel harms from open source AI systems? Sadly, I’m not confident the answer is “yes,” and this is the main reason I only ~50% endorse this post.
When we’re talking about risk management, a 50% chance that a key assumption will work out, when there isn’t a good way to significantly reduce this uncertainty often doesn’t translate into a 50% chance of it being a good plan, but rather a near 0% chance.
The problem with this plan is that it assumes that there are easy ways to robustify the world. What if the only proper defense against bioweapons is a complete monitoring of the entire internet? Perhaps this is something that we’d like to avoid. In this scenario, your plan would likely lead to someone coming up with a fake plan to robustify the world and then claim that it’d be fine for them to release their model as open-source, because people really want to do open-source.
For example, in your plan you write:
This is exactly the kind of plan that I’m worried about. People will be tempted to argue that surely 4 years is enough time for the biodefense plan to be implemented, four years rolls around and it’s clearly not in place, but then they push for release anyway.
You seem to have hypothesised what is to me an obviously unsafe scenario. Let’s suppose our best proprietary models hit upon a dangerous bioweapon capability. Well, now we only have two years to prepare for it, regardless of whether this is completely wildly unrealistic. Worse, this occurs for each and every dangerous capability.
When we’re talking about risk management, a 50% chance that a key assumption will work out, when there isn’t a good way to significantly reduce this uncertainty often doesn’t translate into a 50% chance of it being a good plan, but rather a near 0% chance.