It has been claimed that there’s no fire alarm for AGI, that is, there will be no specific moment or event at which AGI risk becomes sufficiently obvious and agreed upon, so that freaking out about AGI becomes socially acceptable rather than embarrassing. People often implicitly argue for waiting for an (unspecified) future event that tells us AGI is near, after which everyone will know that it’s okay to work on AGI alignment. This seems particularly bad if no such future event (i.e. fire alarm) exists.
This post argues that this is not in fact the implicit strategy that people typically use to evaluate and respond to risks. In particular, it is too discrete. Instead, people perform “the normal dance of accumulating evidence and escalating discussion and brave people calling the problem early and eating the potential embarrassment”. As a result, the existence of a “fire alarm” is not particularly important.
Note that the author does agree that there is some important bias at play here. The original fire alarm post is implicitly considering a _fear shame hypothesis_: people tend to be less cautious in public, because they expect to be negatively judged for looking scared. The author ends up concluding that there is something broader going on and proposes a few possibilities, many of which still suggest that people will tend to be less cautious around risks when they are observed.
Some points made in the very detailed, 15,000-word article:
1. Literal fire alarms don’t work by creating common knowledge, or by providing evidence of a fire. People frequently ignore fire alarms. In one experiment, participants continued to fill out questionnaires while a fire alarm rang, often assuming that someone will lead them outside if it is important.
2. They probably instead work by a variety of mechanisms, some of which are related to the fear shame hypothesis. Sometimes they provide objective evidence that is easier to use as a justification for caution than a personal guess. Sometimes they act as an excuse for cautious or fearful people to leave, without the implication that those people are afraid. Sometimes they act as a source of authority for a course of action (leaving the building).
3. Most of these mechanisms are amenable to partial or incremental effects, and in particular can happen with AGI risk. There are many people who have already boldly claimed that AGI risk is a problem. There exists person-independent evidence; for example, surveys of AI researchers suggest a 5% chance of extinction.
4. For other risks, there does not seem to have been a single discrete moment at which it became acceptable to worry about them (i.e. no “fire alarm”). This includes risks where there has been a lot of caution, such as climate change, the ozone hole, recombinant DNA, COVID, and nuclear weapons.
5. We could think about _building_ fire alarms; many of the mechanisms above are social ones rather than empirical facts about the world. This could be one out of many strategies that we employ against the general bias towards incaution (the post suggests 16).
Planned opinion:
I enjoyed this article quite a lot; it is _really_ thorough. I do see a lot of my own work as pushing on some of these more incremental methods for increasing caution, though I think of it more as a combination of generating more or better evidence, and communicating arguments in a manner more suited to a particular audience. Perhaps I will think of new strategies that aim to reduce fear shame instead.
Planned summary for the Alignment Newsletter:
Planned opinion: