Information hazards: Why you should care and what you can do
This post was written for Convergence Analysis.
Overview
We argue that many people should consider the risk that they could cause harm by developing or sharing (true) information. We think that harm from such information hazards may sometimes be very substantial, and that this applies especially to people who research advanced technologies and/or catastrophic risks, or who often think about such technologies and risks.
However, constantly worrying about information hazards would be paralyzing and unnecessary. We therefore outline a heuristic for quickly identifying whether, in a given situation, it’s worth properly thinking about the hazards some information might pose, and about how to act given those hazards. This heuristic is based on the “potency” and “counterfactual rarity” of the information in question. We next outline, and give examples to illustrate, a range of actions one could take when one has identified that some information could indeed be hazardous.
Why should some people care about information hazards?
An information hazard is “A risk that arises from the dissemination or the potential dissemination of (true) information that may cause harm or enable some agent to cause harm” (Bostrom). There are many types of information hazards. Here are some examples Bostrom gives:
Data hazard: Specific data, such as the genetic sequence of a lethal pathogen or a blueprint for making a thermonuclear weapon, if disseminated, create risk.
[...] Idea hazard: A general idea, if disseminated, creates a risk, even without a data-rich detailed specification.
For example, the idea of using a fission reaction to create a bomb, or the idea of culturing bacteria in a growth medium with an antibiotic gradient to evolve antibiotic resistance, may be all the guidance a suitably prepared developer requires; the details can be figured out.
[...] Attention hazard: mere drawing of attention to some particularly potent or relevant ideas or data increases risk, even when these ideas or data are already “known”.
Because there are countless avenues for doing harm, an adversary faces a vast search task in finding out which avenue is most likely to achieve his goals. Drawing the adversary’s attention to a subset of especially potent avenues can greatly facilitate the search. For example, if we focus our concern and our discourse on the challenge of defending against viral attacks, this may signal to an adversary that viral weapons—as distinct from, say, conventional explosives or chemical weapons—constitute an especially promising domain in which to search for destructive applications.
These and other examples highlight just how large the negative impacts of true information can sometimes be—large enough to increase global catastrophic risks, or perhaps even existential risks. It is worth further highlighting two underlying reasons why information could often have such large, negative impacts:
If an existential catastrophe occurs, it is most likely to be for anthropogenic rather than natural reasons, and, more specifically, may be most likely to be due to the consequences of technological developments. Both the development of technology and the consequences technology has on the world depend in large part on the development and sharing of information. Thus, it seems plausible that, in the worst cases, information hazards could substantially increase existential risks.
While information can be forgotten or can drift into obscurity, it isn’t really possible to “un-develop” or “un-share” information. That is, developing and/or sharing information is a fairly irreversible decision, with potentially very long-lasting and resilient impacts (see also the unilateralist’s curse).
Altogether, we believe that, in some situations (e.g., Bostrom’s examples), developing[1] or sharing information could have an expected negative impact that is larger than the expected positive impact the developer/sharer will achieve through everything else they do in their lives. And we believe that information hazards that are simply very large (rather than so large that they overwhelm the rest of one’s impact) will be much more common.
Thus, at least for some people, it seems like an important way to make their impact more positive would be to reduce the chances that they develop or share hazardous information. This could be just as important as more typical ways to “have a positive impact”.[2]
But it is obviously also true that developing and sharing information is often very beneficial, and that constantly worrying about information hazards would be counterproductive. Thus, in this post we will offer:
a heuristic for quickly identifying whether, in a given situation, it’s worth properly thinking about the hazards some information might pose, and about how to act given those hazards
a list of actions one could consider when one has identified that some information could indeed be hazardous
Should you care about information hazards?
But who is this relevant to? More specifically, for whom is this topic relevant enough that it’s worth them spending time learning and thinking about it? We think that only a small proportion of people will develop information that poses catastrophic or existential risks. However:
We think that there may be an unusually large proportion of such people in the effective altruist and rationalist communities.
We think that the expected value of being aware of this topic could be high even for people who have only a small chance of developing information that poses such severe risks, given how severe those risks would be.
We think that a much larger proportion of people will develop information that poses at least somewhat substantial risks (even if these risks aren’t catastrophic or existential).
It’s possible to cause substantial harm merely by sharing and drawing attention to information that was already developed by others, and that is already known in some circles. We think that many more people will find themselves in a position to cause harm by sharing information than by initially developing information.
For example, one might learn about a technology which is already mentioned in academic journals, and then try to highlight the dangers of this technology to national security policymakers who wouldn’t have read the journal articles. This could increase the chances that this technology will be weaponised, thus causing harm. (This is an attention hazard.)
Many people with malicious intent may not be particularly intelligent or creative, so it may be unexpectedly easy to develop or learn about information that such people hadn’t thought of yet. Doing so could result in these people learning about this information and using it to cause harm.
For people and situations where it’s not worth properly thinking about the hazards some information might pose, this fact can be identified using the heuristic we provide in the next section.
Ultimately, we’d say that, as a rough rule of thumb, this topic is relevant enough that it’s worth spending time learning and thinking about it for people who research advanced technologies and/or catastrophic risks, or who often think about such technologies and risks. We think that such people have non-negligible odds of, at some point, developing or learning of information that poses substantial risks.[3]
When should you think about information hazards?
We’ve argued that it’s worthwhile for people who research advanced technologies and/or catastrophic risks, or who often think about such technologies and risks, to learn and think about the topic of information hazards. But it’d be paralyzing and unnecessary for such people to always worry about information hazards. So when, more specifically, should you take the time to properly think about the hazards some information might pose, and about how to act given these hazards?[4]
We propose a simple heuristic to answer this question:
If some information you may develop or share is related to advanced technologies or catastrophic risks, then think about how “potent” the information is.
If the information’s potency is low, don’t worry about the risk of information hazards; you can proceed with developing or sharing the information.
If the information’s potency is high, consider how “counterfactually rare” this information is.
If the counterfactual rarity is low, then consider using “implementation-related responses”.
If the counterfactual rarity is high, then think further about how hazardous this information might be, and consider using “information-related responses”.
If the information in question isn’t related to advanced technologies or catastrophic risks, don’t worry.
We will now explain the terms and ideas in that heuristic in more detail.
The potency of some information depends on factors such as how many people the information may affect, how intensely each affected person would be affected, how long the effects would last, and so on. For example, the information that it’s possible to create a nuclear weapon is much more potent than information about what you had for breakfast. (As such, there’s no need to worry about the risks posed by information about what you had for breakfast—feel free to share that with whoever might care, and then get on with your day.)
Counterfactual rarity essentially refers to the number of people who are likely to have already developed or learned this information (or similar information), or to develop or learn this information soon anyway. This depends on factors such as how much specialised knowledge is required to arrive at the information, how counterintuitive the information is, the incentives for developing and sharing the information, and so on.
Counterfactual rarity is important because, however potent some information is, the impact of you developing or sharing that information will depend on whether the information is already widely known, and whether it’s very likely to be discovered and publicised soon in any case. For example, the information that it’s possible to create a nuclear weapon is now very widely known, so, even though the information is very potent, it’s no longer worth worrying about it as an information hazard.
Instead, in such cases, we should focus on what we could call implementation-related responses. By this, we essentially mean any actions for reducing risks other than what we might call an “information-related response” (discussed below). Examples of implementation-related responses to the risks posed by nuclear weapons include trying to control access to nuclear materials and trying to establish norms against the creation of nuclear weapons. This seems more valuable (nowadays) than the information-related response of trying to hide the fact that nuclear weapons can be created. This is because the key bottlenecks to causing harm using nuclear weapons are now related to implementing the information, rather than to accessing the information in the first place.
Some information will have high potency and high counterfactual rarity. This could include, for example, information about a new way to engineer a virus. These are the cases in which it’s worth thinking about how hazardous the information might be.
Additionally, in such cases, you should probably consider using information-related responses. By this we mean actions intended to prevent the development or (further) spread of potentially hazardous information, or to influence how the information is developed or (further) spread. For example, you might avoid conducting a line of research that could reveal a new way to engineer a virus, or push for research of that type to require review before it’s conducted, or discourage discussion of such research outside of academic publications. (Note that our purpose here is to illustrate these ideas, rather than to recommend specific actions in response to specific, actual situations.)[5]
The following diagram visually represents our heuristic:
Note that both this heuristic and the response options we discuss below are intended for use at an individual level, to reduce information hazards from one’s own actions. We think that many of the same ideas would apply at an organisational level, or if trying to analyse and reduce the information hazards other people’s actions might cause, but that some modifications would have to be made.[6]
For information with good consequences
Although it’s not our focus, we should also quickly note that essentially the mirror image of this heuristic process can be followed in relation to information you suspect may have good consequences.
First, consider whether the information is highly potent. If it isn’t (e.g., if the information is about how many marbles there are in a jar), then don’t bother further assessing the potential impacts of this information.
If the information is highly potent, then consider whether it’s counterfactually rare. If it’s not counterfactually rare, consider using implementation-related responses. For example, the information that washing one’s hands is a good idea is highly potent, but is now very widely known. As such, in many communities, trying to spread the information further may have relatively little value. However, helping people use the information (e.g., by providing access to clean water) may have high value.
If the information is highly potent and counterfactually rare, then it’s worth thinking about the benefits that may result from developing and sharing the information (in a mirror image of thinking about the possible harms from hazardous information). Additionally, in such cases, you should probably consider using information-related responses. For example, if the information is a cure for cancer, it may be worth considering developing the information via research, or, if the information has recently been developed, spreading the information to more doctors and patients.
What can you do about information hazards?
Let’s say you’ve used the above heuristic, and determined that the information in question is indeed high in potency and counterfactually rarity. You know this means that you should probably think further about how hazardous this information might be, and consider using “information-related responses”. But what specific responses can you employ?
We will now provide a (non-exhaustive) list of responses that may often be worth considering when dealing with potential information hazards, along with examples to illustrate each response. Note that it would sometimes be possible to combine multiple responses (i.e., they are not mutually exclusive).
These responses are very approximately ordered from the least extreme responses, worth considering when the expected harms from developing or sharing the information are relatively low, to the most extreme responses. (But we should note that, for the most part, this post won’t offer very specific guidelines on which response to take in particular situations, or how to decide that. We hope to explore those questions more in future work.)
Potential responses
Develop and/or share the information: The information is sufficiently potent and counterfactually rare that it was worth thinking about the risks, but you conclude that the risks are low enough (compared to the benefits) that you can go ahead and develop and/or share the information.
For example, an AI safety researcher may simply carry on with their research, and with communicating it as if it posed no information hazards, as the balance of risks to benefits seem acceptable to them.
Develop the information, but don’t (yet) share it: Similar to the above, except that you only proceed with developing the information, concluding that sharing it is not worthwhile, or that it’d be better to wait until later to decide whether to share it.
For example, an AI safety researcher may think that their research has sufficiently low risks, and sufficiently high benefits, that it’s worth them proceeding with the research. However, the researcher may also think that sharing the results would be net negative, because some other people could use the results to enhance capabilities of unsafe systems. Or the researcher may decide that sharing the results might be net negative, so they’ll wait to see the full results before deciding whether to share anything about their research.
Think more about the risks: Consider in more detail how risky the information might be. You might do this while continuing to develop and/or share the information, or before doing so (with the results of your thinking determining whether to do so).
An AI safety researcher decides they’re too uncertain about the risks and benefits of the research they had intended to do (or about sharing the results they’ve found) to proceed with that. As such, they first try to think very carefully about those risks and benefits.
Frame the information to reduce risks: Make a conscious effort to frame or explain the information in a way that reduces its risks, such as by influencing how the information is likely to be used or who it is likely to reach.
After many interesting conversations at EA Global, an EA comes up with a new argument for why certain technologies could be very powerful and dangerous. The EA shares their argument, but emphasises how this highlights the importance of caution and differential progress. They also deliberately use somewhat dry, academic language, and avoid particularly dramatic or militaristic language. This is all to reduce the chances of drawing to these technologies the wrong kind of attention from the wrong kind of people.
Develop and/or share a subset of the information: Instead of developing and/or sharing none of the information, or all of it, work out what part of the information would be net-beneficial to develop and/or share, and develop and/or share that part only. (If doing this, it’s worth thinking carefully about whether the part you’re planning to develop and/or share would be sufficient for others to (re)construct the other parts of the information as well.)
The EA from the previous example decides to share the parts of their argument that focus on the risks of catastrophic accidents from certain technologies, but not the parts that focus on the risks of catastrophic misuse of those technologies. This is because the EA concludes that the recommendations they want to make can be supported without mentioning the misuse risks, and that drawing attention to the misuse risks could make such misuse more likely.
Share the information with a subset of people: Do share the information, but not indiscriminately. Instead, work out which people it’s likely to be net-beneficial to share the information with, and share it with just these people. (One reason to do this would be to get these people’s opinion on how dangerous it would be to further develop and/or share the information. Another would be to allow these people to develop defences against whatever harms the information relates to or could cause.)
An AI safety researcher shares a preliminary finding with a handful of trusted and more experienced researchers, to get advice on whether it’s wise to proceed with this line of research, and whether it’s wise to make it public.
A biology student discovers a weakness in the health system that could potentially be exploited by terrorists. They report this to the relevant authorities, but don’t discuss the information publicly.
Avoid developing and/or sharing the information: If the information poses high enough risks (relative to its potential benefits), it may be best to simply avoid developing and/or sharing it at all.
An AI safety researcher comes to the conclusion that a certain line of research could substantially advance capabilities of unsafe systems, without advancing safety to as great an extent. The researcher thus just drops that line of research.
Monitor whether others may develop and/or share the information: For sufficiently risky information, it may be worth actively investigating whether anyone else is developing or sharing the information, or whether anyone seems likely to do so in future. If you identify any such people, you might then warn them of the potential risks, let the relevant authorities know that these people may develop or share this information, and/or try to make it harder for the information to be acted on if it is developed and shared (i.e., try to use implementation-related responses). You might do all this after having stopped developing and/or sharing the information yourself, or while continuing to do so (probably very cautiously).
An AI safety researcher concludes that a certain line of research might be very unsafe. The researcher decides that they should check published papers and ask around in very vague and careful ways to see if anyone else is working on a similar line of research. The researcher identifies a research group that is indeed doing so, and reaches out to that group to discuss reasons this research may be too risky to be worth pursuing.
Decrease the likelihood of others developing and/or sharing the information: This overlaps with parts of the above response option. But there are also ways of doing this that don’t involve monitoring whether others may develop and/or share the information. For example, you could delete your own research notes, or try to steer people away from books, articles, concepts, etc. that helped lead you to the information.
An AI safety researcher concludes that a certain line of research might be very unsafe, so they destroy all their writings related to that research, as well as their earlier writings on ideas that led them to that line of research. They also think back over what previous work and concepts from others helped inspire and lead them to that line of research, and try to avoid mentioning or recommending such work and concepts to others in future, to decrease the odds that others get inspired and led to that research in the same way.
There are also two potential responses that seem to us like they could arguably be classified either as implementation-related responses or as information-related responses. These two responses are the following:
Develop countermeasures: Develop ways of preventing, mitigating, or fixing the potential harms of the information. You might do all this after having stopped developing and/or sharing the information yourself, or while continuing to do so (perhaps to inform your efforts to develop countermeasures).
A biology student discovers a weakness in the health system that could potentially be exploited by terrorists. The student tries to think of ways to patch up that weakness, or to respond rapidly if the weakness is exploited. To help with that effort, they also continue very carefully researching that weakness, and contact some very carefully selected people to help.
Improve the groups that might discover or use the information: Try to improve the values (e.g., level of altruism) and/or capabilities of the people or organisations who might develop, learn about, or implement the information. This would be done to reduce the risks that they will share or use the information in harmful ways. (See also Improving the future by influencing actors’ benevolence, intelligence, and power.)
An AI policy researcher decides that it would theoretically be good for some policymakers to learn of their findings, but that that really depends on how altruistic, thoughtful, competent, and good at keeping secrets those policymakers are. They decide to enlist help in trying to improve some policymakers’ values and capabilities first, and then perhaps share the information with these policymakers later.
Again, we emphasise that this list is probably not exhaustive, that some of the response options could be used in combination, and that we aren’t here making specific suggestions on when to use which response. Note also that one may often be able to switch from one response option to another later (e.g., from sharing with a subset of people to sharing publicly), especially if one has started with a relatively cautious option.
Conclusion
We believe that many people have a significant chance of at some point being in a position to develop and/or share information that could cause substantial harm. We think that this is especially true for people who research advanced technologies and/or catastrophic risks, or who often think about such technologies and risks. Thus, we think that responding well to potential information hazards can be a very important way for such people to have a more positive impact on the world.
We suggested a heuristic to use when one is facing a potential information hazard: first ask yourself whether the information has high potency, and then whether it has high counterfactual rarity. If the answer to both questions is no, then you don’t have to worry. If the information is highly potent but not counterfactually rare, you should consider using “implementation-related responses” (e.g., limiting access to uranium). If the information is highly potent and counterfactually rare, then you should think further about whether this information could be hazardous, and consider using “information-related responses”.
We then outlined a set of information-related responses one could consider, which we hope will guide readers in making more informed choices when facing potential information hazards.
There are two key things that this post has not done:
Given clear guidelines on which response options to use in which situations, and/or how to think through that question.
Clearly integrated our ideas with those developed by others (e.g., here, here, and here).
We hope to explore those avenues further in future work.
This post was written for Convergence by MichaelA and Justin Shovelain, based on an earlier post written by Justin and Andrés Gómez Emilsson. We’re also grateful to David Kristoffersson, Aaron Gertler, and Will Bradshaw for helpful comments and edits on earlier drafts, and to Anders Sandberg and Ben Harack for helpful discussions on the general topic.
- ↩︎
By “developing”, we mean things like “coming up with” or “independently discovering through research”. Note that, with our usage of the term “develop”, it’s possible to consider the potential dangers of some information that you might develop but haven’t yet developed. For example, before starting research that you expect might reveal a new way to engineer a virus, you could already assess how dangerous the results of that research might be (though this assessment would of course be quite abstract and approximate).
- ↩︎
We would in fact further argue that there’s no relevant moral difference between “having a positive impact” and “reducing one’s negative impact”, and we hope to write about that point in the future. But that point isn’t necessary for our arguments in this post.
- ↩︎
We’re also inclined to think that it would be possible to modify the basic ideas in this post so as to make them relevant to various other sets of people, and potentially to all people. But the people we’re most interested in addressing, and for whom we think this post’s version of these ideas is most useful, are indeed people who research advanced technologies and/or catastrophic risks, or who often think about such technologies and risks.
- ↩︎
An alternative framework to ours for answering a somewhat related question is provided in The Offense-Defense Balance of Scientific Knowledge: Does Publishing AI Research Reduce Misuse? (which we read after writing this post). What that article calls “Counterfactual possession” is somewhat similar to what we call “Counterfactual rarity”.
- ↩︎
One alternative, or additional, approach for deciding whether to worry about information hazards in a particular situation would be the following: Consider how similar the information under consideration is to various types of information hazards that have been described (such as by Bostrom or by Crawford et al.), and whether this highlights a mechanism through which the information could cause harm. For example, you could recall the concept of an attention hazard, and then think about whether the article you’re considering writing could end up raising some information to the wrong people’s attention in a way that causes harm.
- ↩︎
There are also some valuable actions one can take to build one’s general capacity for avoiding or mitigating information hazards. Learning about information hazards and how to handle them is one such action. This article touches on some other actions for building that general capacity.
- 2020 AI Alignment Literature Review and Charity Comparison by 21 Dec 2020 15:25 UTC; 155 points) (EA Forum;
- 2020 AI Alignment Literature Review and Charity Comparison by 21 Dec 2020 15:27 UTC; 137 points) (
- Causal diagrams of the paths to existential catastrophe by 1 Mar 2020 14:08 UTC; 51 points) (EA Forum;
- The Case for a Journal of AI Alignment by 9 Jan 2021 18:13 UTC; 45 points) (
- What are information hazards? by 18 Feb 2020 19:34 UTC; 41 points) (
- What are information hazards? by 5 Feb 2020 20:50 UTC; 38 points) (EA Forum;
- Mapping downside risks and information hazards by 20 Feb 2020 14:46 UTC; 22 points) (
- Memetic downside risks: How ideas can evolve and cause harm by 25 Feb 2020 19:47 UTC; 21 points) (
- Good and bad ways to think about downside risks by 11 Jun 2020 1:38 UTC; 19 points) (
- 22 Sep 2020 4:34 UTC; 7 points) 's comment on Needed: AI infohazard policy by (
- 5 Jul 2023 11:59 UTC; 7 points) 's comment on Some background for reasoning about dual-use alignment research by (
- 24 Feb 2020 9:22 UTC; 1 point) 's comment on What are information hazards? by (EA Forum;
If you’re doing things in a group, instead of alone, useful subsets of this framework could be the standard OPSEC process and controls for classified information. There’s some pretty big Chesterton’s Fences around them.
The OPSEC process is meant specifically for when you’re planning a specific activity, the value to the adversary of information about your plans will diminish rapidly as you conclude that specific activity, but any hint as to your plans might be detrimental. So, it’s more of a set of guidelines than a specific policy or procedure, and encourages thinking about how many decibels of probability you’re allowing access to.
Controls for classified is meant for information that will be harmful even after the conclusion of a specific activity. It’s the converse of the OPSEC process: A large collection of highly detailed policies and procedures for marking and protecting information. It’s certainly a bit heavyweight for independent research groups smaller than the Manhattan Project, but some principles could apply; like a central classification authority to reduce the cognitive load of marking your products, and uniform procedures for handling products with each level of marking.
Here are some relevant thoughts from Andrew Critch on a FLI podcast episode I just heard (though it was released in 2017):
Promote ideas that make the information hazard seem ridiculous or uninteresting. An example that may or may not be happening is the US government enabling stories of extraterrestrial origin to hide the possibility that they have unreasonably advanced aerospace technology, materially, by encasing it in dumb glowy saucer stuff that doesn’t make any sense. (a probably fictional example is good because if someone was smart enough and motivated enough to hide something like this, I probably wouldn’t want to tell people about it (if this turns out not to be fictional, USgovt, I’m very sorry, we haven’t thought enough about this to understand why you’d want to hide it.))
If the information hazard concerned is going to be around for a long time, you might want to consider constructing an ideological structure that systematically hides the information hazard, under which the only people who get anywhere near questioning enough of their assumptions to find the information hazard also tend to be responsible enough to take it, and where the spread of the information hazard is universally limited. Cease speaking the words that make it articulable. It should be noted, this wont look, from the inside, like a conspiracy. There will not be a single refutation of the idea, under this ideology, because no one would think to write it. It will just seem naturally difficult for most people living under it to notice how the idea might ever be important.
There’s also trying to help improving their ability to handles such information: