[Link and commentary] The Offense-Defense Balance of Scientific Knowledge: Does Publishing AI Research Reduce Misuse?

This is (partly) a linkpost for a paper published earlier this year by Toby Shevlane and Allan Dafoe, both researchers affiliated with the Centre for the Governance of AI. Here’s the abstract:

There is growing concern over the potential misuse of artificial intelligence (AI) research. Publishing scientific research can facilitate misuse of the technology, but the research can also contribute to protections against misuse. This paper addresses the balance between these two effects. Our theoretical framework elucidates the factors governing whether the published research will be more useful for attackers or defenders, such as the possibility for adequate defensive measures, or the independent discovery of the knowledge outside of the scientific community. The balance will vary across scientific fields. However, we show that the existing conversation within AI has imported concepts and conclusions from prior debates within computer security over the disclosure of software vulnerabilities. While disclosure of software vulnerabilities often favours defence, this cannot be assumed for AI research. The AI research community should consider concepts and policies from a broad set of adjacent fields, and ultimately needs to craft policy well-suited to its particular challenges.

The paper is only 8 pages long, and I found it very readable and densely packed with useful insights and models. It also seems highly relevant to the topics of information hazards, differential progress, and (by extension) global catastrophic risks (GCRs) and existential risks. I’d very much recommend reading it if you’re interested in AI research or any of those three topics.

Avoiding mentioning GCRs and existential risks

(Here I go on a tangent with relevance beyond this paper.)

Interestingly, Shevlane and Dafoe don’t explicitly use the terms “information hazards”, “differential progress”, “global catastrophic risks”, or “existential risks” in the paper. (Although they do reference Bostrom’s paper on information hazards.)

Furthermore, in the case of GCRs and existential risks, even the concepts are not clearly hinted at. My guess is that Shevlane and Dafoe were consciously avoiding mention of existential (or global catastrophic) risks, and keeping their examples of AI risks relatively “near-term” and “mainstream”, in order to keep their paper accessible and “respectable” for a wider audience. For example, they write:

The field of AI is in the midst of a discussion about its own disclosure norms, in light of the increasing realization of AI’s potential for misuse. AI researchers and policymakers are now expressing growing concern about a range of potential misuses, including: facial recognition for targeting vulnerable populations, synthetic language and video that can be used to impersonate humans, algorithmic decision making that amplifies biases and unfairness, and drones that can be used to disrupt air-traffic or launch attacks [6]. If the underlying technology continues to become more powerful, additional avenues for harmful use will continue to emerge.

That last sentence felt to me like it was meant to be interpretable as about GCRs and existential risks, for readers who are focused on such risks, without making the paper seem “weird” or “doomsaying” to other audiences.

I think my tentative “independent impression” is that it’d be better for papers like this to include at least some, somewhat explicit mentions of GCRs and existential risks. My rationale is that this might draw more attention to such risks, lend work on those risks some of the respectability had by papers like this and their authors, and more explicitly draw out the particular implications of this work for such risks.

But I can see the argument against that. Essentially, just as the paper and its authors could lend some respectability to work on those risks, some of the “crackpot vibes” of work on such risks might rub off on the paper and its authors. This could limit their positive influence.

And I have a quite positive impression of Dafoe, and now of Shevlane (based on this one paper). Thus, after updating on the fact that they (seemingly purposely) steered clear of mentioning GCRs or existential risks, my tentative “belief” would be that that was probably a good choice, in this case. But I thought it was an interesting point worth raising, and I accidentally ended up writing more about it than planned.

(I’m aware that this sort of issue has been discussed before; this is meant more as a quick take than a literature review. Also, I should note that it’s possible that Shevlane is just genuinely not very interested in GCRs and existential risks, and that that’s the whole reason they weren’t mentioned.)

This post is related to my work with Convergence Analysis, and I’m grateful to David Kristoffersson for helpful feedback, but the views expressed here are my own.