[Link and commentary] The Offense-Defense Balance of Scientific Knowledge: Does Publishing AI Research Reduce Misuse?
This is (partly) a linkpost for a paper published earlier this year by Toby Shevlane and Allan Dafoe, both researchers affiliated with the Centre for the Governance of AI. Here’s the abstract:
There is growing concern over the potential misuse of artificial intelligence (AI) research. Publishing scientific research can facilitate misuse of the technology, but the research can also contribute to protections against misuse. This paper addresses the balance between these two effects. Our theoretical framework elucidates the factors governing whether the published research will be more useful for attackers or defenders, such as the possibility for adequate defensive measures, or the independent discovery of the knowledge outside of the scientific community. The balance will vary across scientific fields. However, we show that the existing conversation within AI has imported concepts and conclusions from prior debates within computer security over the disclosure of software vulnerabilities. While disclosure of software vulnerabilities often favours defence, this cannot be assumed for AI research. The AI research community should consider concepts and policies from a broad set of adjacent fields, and ultimately needs to craft policy well-suited to its particular challenges.
The paper is only 8 pages long, and I found it very readable and densely packed with useful insights and models. It also seems highly relevant to the topics of information hazards, differential progress, and (by extension) global catastrophic risks (GCRs) and existential risks. I’d very much recommend reading it if you’re interested in AI research or any of those three topics.
Avoiding mentioning GCRs and existential risks
(Here I go on a tangent with relevance beyond this paper.)
Interestingly, Shevlane and Dafoe don’t explicitly use the terms “information hazards”, “differential progress”, “global catastrophic risks”, or “existential risks” in the paper. (Although they do reference Bostrom’s paper on information hazards.)
Furthermore, in the case of GCRs and existential risks, even the concepts are not clearly hinted at. My guess is that Shevlane and Dafoe were consciously avoiding mention of existential (or global catastrophic) risks, and keeping their examples of AI risks relatively “near-term” and “mainstream”, in order to keep their paper accessible and “respectable” for a wider audience. For example, they write:
The field of AI is in the midst of a discussion about its own disclosure norms, in light of the increasing realization of AI’s potential for misuse. AI researchers and policymakers are now expressing growing concern about a range of potential misuses, including: facial recognition for targeting vulnerable populations, synthetic language and video that can be used to impersonate humans, algorithmic decision making that amplifies biases and unfairness, and drones that can be used to disrupt air-traffic or launch attacks [6]. If the underlying technology continues to become more powerful, additional avenues for harmful use will continue to emerge.
That last sentence felt to me like it was meant to be interpretable as about GCRs and existential risks, for readers who are focused on such risks, without making the paper seem “weird” or “doomsaying” to other audiences.
I think my tentative “independent impression” is that it’d be better for papers like this to include at least some, somewhat explicit mentions of GCRs and existential risks. My rationale is that this might draw more attention to such risks, lend work on those risks some of the respectability had by papers like this and their authors, and more explicitly draw out the particular implications of this work for such risks.
But I can see the argument against that. Essentially, just as the paper and its authors could lend some respectability to work on those risks, some of the “crackpot vibes” of work on such risks might rub off on the paper and its authors. This could limit their positive influence.
And I have a quite positive impression of Dafoe, and now of Shevlane (based on this one paper). Thus, after updating on the fact that they (seemingly purposely) steered clear of mentioning GCRs or existential risks, my tentative “belief” would be that that was probably a good choice, in this case. But I thought it was an interesting point worth raising, and I accidentally ended up writing more about it than planned.
(I’m aware that this sort of issue has been discussed before; this is meant more as a quick take than a literature review. Also, I should note that it’s possible that Shevlane is just genuinely not very interested in GCRs and existential risks, and that that’s the whole reason they weren’t mentioned.)
This post is related to my work with Convergence Analysis, and I’m grateful to David Kristoffersson for helpful feedback, but the views expressed here are my own.
- We summarized the top info hazard articles and made a prioritized reading list by 14 Dec 2021 19:46 UTC; 41 points) (EA Forum;
- 24 Feb 2020 8:31 UTC; 14 points) 's comment on MichaelA’s Quick takes by (EA Forum;
- 17 Feb 2021 5:32 UTC; 13 points) 's comment on Project Ideas in Biosecurity for EAs by (EA Forum;
- [Article review] Artificial Intelligence, Values, and Alignment by 9 Mar 2020 12:42 UTC; 13 points) (
I thought that the discussion of various fields having different tradeoffs with regard to disclosing vulnerabilities, was particularly interesting:
Good topic for a paper. I wonder if the publishing of risk analysis frameworks itself winds up being somewhat counterproductive by causing less original thought applied to a novel threat and more box checking/ass covering/transfer of responsibility.
I can imagine that happening in some cases.
However, I thought this particular framework felt much more like a way of organising one’s thoughts, highlighting considerations to attend to, etc., which just made things a bit less nebulous and abstract, rather than boiling things down to a list of very precise, very simple “things to check” or “boxes to tick”. My guess would be that a framework of this form would be more likely to guide thought, and possibly even prompt thought because it means that you feel like you can get some traction on an otherwise extremely fuzzy problem, and less likely to lead to people just going through the motions.
Also, as a somewhat separate point, I find it plausible that even a framework that is very “tick-box-y” could still be an improvement, in cases where by default people have very little direct incentives to think about risks/safety, and the problem is very complicated. If the alternative is “almost no thought about large-scale risks” rather than “original and context-relevant thought about large-scale risks”, then even just ass-covering and box-checking could be an improvement.
I also have a prior that in some cases, even very intelligent professionals in very complicated domains can benefit from checklists. This is in part based on Atul Gawande’s work in the healthcare setting (which I haven’t looked into closely):
(On the other hand, I definitely also found the “tick-box” type mindset instilled/required the school I previously worked at infuriating and deeply counterproductive. But I think that was mostly because the boxes to be ticked were pointless; I think a better checklist, which also remained somewhat abstract and focused on principles rather than specific actions, would’ve been possible.)
Good points, I agree.