I note that the comments here include a lot of debate on the implications of this post’s thesis and on policy recommendations and on social explanations for why the thesis is true. No commenter has yet disagreed with the actual thesis itself, which is that this paper is a representative example of a field that is “more advocacy than science”, in which a large network of Open Philanthropy Project-funded advocates cite each other in a groundless web of footnotes which “vastly misrepresents the state of the evidence” in service of the party line.
FWIW, I’m a commenter here and I disagree with the exact thesis you stated:
that this paper is a representative example of a field that is “more advocacy than science”, in which a large network of Open Philanthropy Project-funded advocates cite each other in a groundless web of footnotes which “vastly misrepresents the state of the evidence” in service of the party line
I think “more advocacy than science” seems reasonably accurate though perhaps a bit exaggerated.
However, I think ‘a large network of Open Philanthropy Project-funded advocates cite each other in a groundless web of footnotes which “vastly misrepresents the state of the evidence” in service of the party line’ seems pretty inaccurate. In particular, I don’t think current work “vastly misrepresents the state of evidence”, I’m not sold there is a party line, and I’m not convinced that “groundless webs of footnotes” are typical or that groundless.
(I think my original comment does disagree with the thesis you stated though not super clearly. But now, for the record, I am commenting and disagreeing.)
More detailed discussion of disagreements with the thesis stated in the parent comment.
I think that bio evals might be “more advocacy than science” or at least the people running these evals often seem biased. (Perhaps they’ve concluded that bio risk from AI is important for other reasons and are then pushing bio evals which don’t provide much evidence about risk from current models or future models.) This seems sad.
I don’t really see examples of “vastly misrepresenting the state of evidence”. As far as I can tell, the main paper discussed by this post, Open-Sourcing Highly Capable Foundation Models, doesn’t make strong claims about current models? (Perhaps it was edited recently?)
Let’s go through some claims in this paper. (I edited this section to add a bit more context)
The original post mentions:
For instance, as support for the claim that LLMs have great “capabilities in aiding and automating scientific research,” the “Open-Sourcing” paper cites …
...
The paper also says that “red-teaming on Anthropic’s Claude 2 identified significant potential for biosecurity risk”
The corresponding section in the paper (as recently accessed) is:
Biological and chemical weapons development. Finally, current foundation models have shown nascent capabilities in aiding and automating scientific research, especially when augmented with external specialized tools and databases [107, 108]. [...] . For example, pre-release model evaluation of GPT-4 showed that the model could re-engineer known harmful biochemical compounds [110], and red-teaming on Anthropic’s Claude 2 identified significant potential for biosecurity risks [56, 111].
The statement “current foundation models have shown nascent capabilities in aiding and automating scientific research, especially when augmented with external specialized tools and databases ” seems basically reasonable to me?
The GPT4 and Anthropic claims seems a bit misleading though don’t seem to “vastly misrepresent the state of evidence”. (I think it’s misleading because what GPT4 did isn’t actually that impressive as noted in the original post and the claim for “significant potential” refers to future models not Claude 2 itself.)
To be clear, my overall take is:
The original post is maybe somewhat exaggerated and I disagree somewhat with it.
I disagree more strongly with the thesis stated in the parent as it seems like it’s taking a stronger (and imo less accurate) stance than the original post.
Beyond this, I’d note that that paper is centrally discussing “Highly Capable Foundation Models”. It’s notable that (as defined here) current models are not considered highly capable. (As other commenters have noted.)
Quotes from the paper (emphasis mine):
Abstract:
we argue that for some highly capable foundation models likely to be developed in the near future
Executive summary:
we argue that for some highly capable models likely to emerge in the near future, the risks of open sourcing may outweigh the benefits
“What are Highly Capable Foundation Models” section (page 6):
“Highly capable” foundation models. We define highly capable foundation models as foundation models that exhibit high performance across a broad domain of cognitive tasks, often performing the
tasks as well as, or better than, a human.
I think it seems sad that you only get to the actual definition of “Highly Capable Foundation Models” on page 6 rather than in the intro or abstract. Given that many people might think that such models aren’t going to be created in the near future.
FWIW: I think you’re right that I should have paid more attention to the current v future models split in the paper. But I also think that the paper is making… kinda different claims at different times.
Specifically when it talks about the true-or-false world-claims it makes, it talks about models potentially indefinitely far in the future; but when it talks about policy it talks about things you should start doing soon or now.
For instance, consider part 1 of the conclusion:
1. Developers and governments should recognise that some highly capable models will be too
dangerous to open-source, at least initially.
If models are determined to pose significant threats, and those risks are determined to outweigh the
potential benefits of open-sourcing, then those models should not be open-sourced. Such models may
include those that can materially assist development of biological and chemical weapons [50, 109],
enable successful cyberattacks against critical national infrastructure [52], or facilitate highly-effective
manipulation and persuasion [88].[30]
The [50] and [109] citations are to the two uncontroled, OpenPhil-funded papers from my “science” section above. The [30] is to a footnote like this:
Note that we do not claim that existing models are already too risky. We also do not make any predictions about how risky the next generation of models will be. Our claim is that developers need to assess the risks and be willing to not open-source a model if the risks outweigh the benefits.
And like… if you take this footnote literally, then this paragraph is almost tautologically true!
Even I think you shouldn’t open source a model “if the risks outweigh the benefits,” how could I think otherwise? If you take it to be making no predictions about current or the next generation of models—well, nothing to object to. Straightforward application of “don’t do bad things.”
But if you take it literally—“do not make any predictions”? -- then there’s no reason to actually recommend stuff in the way that the next pages do, like saying the NIST should provide guidance on whether it’s ok to open source something; and so on. Like there’s a bunch of very specific suggestions that aren’t the kind of thing you’d be writing about a hypothetical or distant possibility.
And this sits even more uneasily with claims from earlier in the paper: “Our general recommendation is that it is prudent to assume that the next generation of foundation models could exhibit a sufficiently high level of general-purpose capability to actualize specific extreme risks.” (p8 -- !?!). This comes right after it talks about the biosecurity risks of Claude. Or “AI systems might soon present extreme biological risk” Etc.
I could go on, but in general I think the paper is just… unclear about what it is saying about near-future models.
For purposes of policy, it seems to think that we should spin up things to specifically legislate the next gen; it’s meant to be a policy paper, not a philosophy paper, after all. This is true regardless of whatever disclaimers it includes about how it is not making predictions about the next gen. This seems very true when you look at the actual uses to which the paper is put.
I note that the comments here include a lot of debate on the implications of this post’s thesis and on policy recommendations and on social explanations for why the thesis is true. No commenter has yet disagreed with the actual thesis itself, which is that this paper is a representative example of a field that is “more advocacy than science”, in which a large network of Open Philanthropy Project-funded advocates cite each other in a groundless web of footnotes which “vastly misrepresents the state of the evidence” in service of the party line.
FWIW, I’m a commenter here and I disagree with the exact thesis you stated:
I think “more advocacy than science” seems reasonably accurate though perhaps a bit exaggerated.
However, I think ‘a large network of Open Philanthropy Project-funded advocates cite each other in a groundless web of footnotes which “vastly misrepresents the state of the evidence” in service of the party line’ seems pretty inaccurate. In particular, I don’t think current work “vastly misrepresents the state of evidence”, I’m not sold there is a party line, and I’m not convinced that “groundless webs of footnotes” are typical or that groundless.
(I think my original comment does disagree with the thesis you stated though not super clearly. But now, for the record, I am commenting and disagreeing.)
More detailed discussion of disagreements with the thesis stated in the parent comment.
I think that bio evals might be “more advocacy than science” or at least the people running these evals often seem biased. (Perhaps they’ve concluded that bio risk from AI is important for other reasons and are then pushing bio evals which don’t provide much evidence about risk from current models or future models.) This seems sad.
I don’t really see examples of “vastly misrepresenting the state of evidence”. As far as I can tell, the main paper discussed by this post, Open-Sourcing Highly Capable Foundation Models, doesn’t make strong claims about current models? (Perhaps it was edited recently?)
Let’s go through some claims in this paper. (I edited this section to add a bit more context)
The original post mentions:
...
The corresponding section in the paper (as recently accessed) is:
The statement “current foundation models have shown nascent capabilities in aiding and automating scientific research, especially when augmented with external specialized tools and databases ” seems basically reasonable to me?
The GPT4 and Anthropic claims seems a bit misleading though don’t seem to “vastly misrepresent the state of evidence”. (I think it’s misleading because what GPT4 did isn’t actually that impressive as noted in the original post and the claim for “significant potential” refers to future models not Claude 2 itself.)
To be clear, my overall take is:
The original post is maybe somewhat exaggerated and I disagree somewhat with it.
I disagree more strongly with the thesis stated in the parent as it seems like it’s taking a stronger (and imo less accurate) stance than the original post.
Beyond this, I’d note that that paper is centrally discussing “Highly Capable Foundation Models”. It’s notable that (as defined here) current models are not considered highly capable. (As other commenters have noted.)
Quotes from the paper (emphasis mine):
Abstract:
Executive summary:
“What are Highly Capable Foundation Models” section (page 6):
I think it seems sad that you only get to the actual definition of “Highly Capable Foundation Models” on page 6 rather than in the intro or abstract. Given that many people might think that such models aren’t going to be created in the near future.
FWIW: I think you’re right that I should have paid more attention to the current v future models split in the paper. But I also think that the paper is making… kinda different claims at different times.
Specifically when it talks about the true-or-false world-claims it makes, it talks about models potentially indefinitely far in the future; but when it talks about policy it talks about things you should start doing soon or now.
For instance, consider part 1 of the conclusion:
The [50] and [109] citations are to the two uncontroled, OpenPhil-funded papers from my “science” section above. The [30] is to a footnote like this:
And like… if you take this footnote literally, then this paragraph is almost tautologically true!
Even I think you shouldn’t open source a model “if the risks outweigh the benefits,” how could I think otherwise? If you take it to be making no predictions about current or the next generation of models—well, nothing to object to. Straightforward application of “don’t do bad things.”
But if you take it literally—“do not make any predictions”? -- then there’s no reason to actually recommend stuff in the way that the next pages do, like saying the NIST should provide guidance on whether it’s ok to open source something; and so on. Like there’s a bunch of very specific suggestions that aren’t the kind of thing you’d be writing about a hypothetical or distant possibility.
And this sits even more uneasily with claims from earlier in the paper: “Our general recommendation is that it is prudent to assume that the next generation of foundation models could exhibit a sufficiently high level of general-purpose capability to actualize specific extreme risks.” (p8 -- !?!). This comes right after it talks about the biosecurity risks of Claude. Or “AI systems might soon present extreme biological risk” Etc.
I could go on, but in general I think the paper is just… unclear about what it is saying about near-future models.
For purposes of policy, it seems to think that we should spin up things to specifically legislate the next gen; it’s meant to be a policy paper, not a philosophy paper, after all. This is true regardless of whatever disclaimers it includes about how it is not making predictions about the next gen. This seems very true when you look at the actual uses to which the paper is put.