FWIW: I think you’re right that I should have paid more attention to the current v future models split in the paper. But I also think that the paper is making… kinda different claims at different times.
Specifically when it talks about the true-or-false world-claims it makes, it talks about models potentially indefinitely far in the future; but when it talks about policy it talks about things you should start doing soon or now.
For instance, consider part 1 of the conclusion:
1. Developers and governments should recognise that some highly capable models will be too
dangerous to open-source, at least initially.
If models are determined to pose significant threats, and those risks are determined to outweigh the
potential benefits of open-sourcing, then those models should not be open-sourced. Such models may
include those that can materially assist development of biological and chemical weapons [50, 109],
enable successful cyberattacks against critical national infrastructure [52], or facilitate highly-effective
manipulation and persuasion [88].[30]
The [50] and [109] citations are to the two uncontroled, OpenPhil-funded papers from my “science” section above. The [30] is to a footnote like this:
Note that we do not claim that existing models are already too risky. We also do not make any predictions about how risky the next generation of models will be. Our claim is that developers need to assess the risks and be willing to not open-source a model if the risks outweigh the benefits.
And like… if you take this footnote literally, then this paragraph is almost tautologically true!
Even I think you shouldn’t open source a model “if the risks outweigh the benefits,” how could I think otherwise? If you take it to be making no predictions about current or the next generation of models—well, nothing to object to. Straightforward application of “don’t do bad things.”
But if you take it literally—“do not make any predictions”? -- then there’s no reason to actually recommend stuff in the way that the next pages do, like saying the NIST should provide guidance on whether it’s ok to open source something; and so on. Like there’s a bunch of very specific suggestions that aren’t the kind of thing you’d be writing about a hypothetical or distant possibility.
And this sits even more uneasily with claims from earlier in the paper: “Our general recommendation is that it is prudent to assume that the next generation of foundation models could exhibit a sufficiently high level of general-purpose capability to actualize specific extreme risks.” (p8 -- !?!). This comes right after it talks about the biosecurity risks of Claude. Or “AI systems might soon present extreme biological risk” Etc.
I could go on, but in general I think the paper is just… unclear about what it is saying about near-future models.
For purposes of policy, it seems to think that we should spin up things to specifically legislate the next gen; it’s meant to be a policy paper, not a philosophy paper, after all. This is true regardless of whatever disclaimers it includes about how it is not making predictions about the next gen. This seems very true when you look at the actual uses to which the paper is put.
FWIW: I think you’re right that I should have paid more attention to the current v future models split in the paper. But I also think that the paper is making… kinda different claims at different times.
Specifically when it talks about the true-or-false world-claims it makes, it talks about models potentially indefinitely far in the future; but when it talks about policy it talks about things you should start doing soon or now.
For instance, consider part 1 of the conclusion:
The [50] and [109] citations are to the two uncontroled, OpenPhil-funded papers from my “science” section above. The [30] is to a footnote like this:
And like… if you take this footnote literally, then this paragraph is almost tautologically true!
Even I think you shouldn’t open source a model “if the risks outweigh the benefits,” how could I think otherwise? If you take it to be making no predictions about current or the next generation of models—well, nothing to object to. Straightforward application of “don’t do bad things.”
But if you take it literally—“do not make any predictions”? -- then there’s no reason to actually recommend stuff in the way that the next pages do, like saying the NIST should provide guidance on whether it’s ok to open source something; and so on. Like there’s a bunch of very specific suggestions that aren’t the kind of thing you’d be writing about a hypothetical or distant possibility.
And this sits even more uneasily with claims from earlier in the paper: “Our general recommendation is that it is prudent to assume that the next generation of foundation models could exhibit a sufficiently high level of general-purpose capability to actualize specific extreme risks.” (p8 -- !?!). This comes right after it talks about the biosecurity risks of Claude. Or “AI systems might soon present extreme biological risk” Etc.
I could go on, but in general I think the paper is just… unclear about what it is saying about near-future models.
For purposes of policy, it seems to think that we should spin up things to specifically legislate the next gen; it’s meant to be a policy paper, not a philosophy paper, after all. This is true regardless of whatever disclaimers it includes about how it is not making predictions about the next gen. This seems very true when you look at the actual uses to which the paper is put.