I appreciate that you are putting thought into this. Overall I think that “making the world more robust to the technologies we have” is a good direction.
In practice, how does this play out?
Depending on the exact requirements, I think this would most likely amount to an effective ban on future open-sourcing of generalist AI models like Llama2 even when they are far behind the frontier. Three reasons that come to mind:
The set of possible avenues for “novel harms” is enormous, especially if the evaluation involves “the ability to finetune [...], external tooling which can be built on top [...], and API calls to other [SOTA models]”. I do not see any way to clearly establish “no novel harms” with such a boundless scope. Heck, I don’t even expect proprietary, closed-source models to be found safe in this way.
There are many, many actors in the open-source space, working on many, many AI models (even just fine-tunes of LLaMA/Llama2). That is kind of the point of open sourcing! It seems unlikely that outside evaluators would be able to evaluate all of these, or for all these actors to do high-quality evaluation themselves. In that case, this requirement turns into a ban on open-sourcing for all but the largest & best-resourced actors (like Meta).
There aren’t incentives for others to robustify existing systems or to certify “OK you’re allowed to open-source now”, in the way as there are for responsible disclosure. By default, I expect those steps to just not happen, & for that to chill open-sourcing.
To clarify, I’m imagining that this protocol would be applied to the open sourcing of foundation models. Probably you could operationalize this as “any training run which consumed > X compute” for some judiciously chosen X.
I appreciate that you are putting thought into this. Overall I think that “making the world more robust to the technologies we have” is a good direction.
Depending on the exact requirements, I think this would most likely amount to an effective ban on future open-sourcing of generalist AI models like Llama2 even when they are far behind the frontier. Three reasons that come to mind:
The set of possible avenues for “novel harms” is enormous, especially if the evaluation involves “the ability to finetune [...], external tooling which can be built on top [...], and API calls to other [SOTA models]”. I do not see any way to clearly establish “no novel harms” with such a boundless scope. Heck, I don’t even expect proprietary, closed-source models to be found safe in this way.
There are many, many actors in the open-source space, working on many, many AI models (even just fine-tunes of LLaMA/Llama2). That is kind of the point of open sourcing! It seems unlikely that outside evaluators would be able to evaluate all of these, or for all these actors to do high-quality evaluation themselves. In that case, this requirement turns into a ban on open-sourcing for all but the largest & best-resourced actors (like Meta).
There aren’t incentives for others to robustify existing systems or to certify “OK you’re allowed to open-source now”, in the way as there are for responsible disclosure. By default, I expect those steps to just not happen, & for that to chill open-sourcing.
To clarify, I’m imagining that this protocol would be applied to the open sourcing of foundation models. Probably you could operationalize this as “any training run which consumed > X compute” for some judiciously chosen X.