Precise P(doom) isn’t very important for prioritization or strategy
People spend time trying to determine the probability that AI will become an existential risk, sometimes referred to as P(doom).
One point that I think gets missed in this discussion is that a precise estimate of P(doom) isn’t that important for prioritization or strategy.
I think it’s plausible that P(doom) is greater than 1%. For prioritization, even a 1% chance of existential catastrophe from AI this century would be sufficient to make AI the most important existential risk. The probability of existential catastrophe from nuclear war, pandemics, and other catastrophes seems lower than 1%. Identifying exactly where P(doom) lies in the 1%-99% range doesn’t change priorities much.
Like AI timelines, its unclear that changing P(doom) would change our strategy towards alignment. Changing P(doom) shouldn’t dramatically change which projects we focus on, since we probably need to try as many things as possible, and quickly. I don’t think the list of projects or the resources we dedicate to them would change much in the 1% or 99% worlds. Are there any projects that you would robustly exclude from consideration if P(doom) was 1-10% but include if P(doom) was 90-99% (and vice versa)?
I think communicating P(doom) can be useful for other reasons like assessing progress or getting a sense of someone’s priors, but it doesn’t seem that important overall.
- How should we think about the decision relevance of models estimating p(doom)? by 11 May 2023 4:16 UTC; 11 points) (
- 14 Mar 2024 3:55 UTC; 4 points) 's comment on Mo Putera’s Quick takes by (EA Forum;
Speaking as someone who does work on prioritization, this is the opposite of my lived experience, which is that robust broadly credible values for this would be incredibly valuable, and I would happily accept them over billions of dollars for risk reduction and feel civilization’s prospects substantially improved.
These sorts of forecasts are critical to setting budget and impact threshold across cause areas, and even more crucially, to determining the signs of interventions, e.g. in arguments about whether to race for AGI with less concern about catastrophic unintended AI action, the relative magnitude of the downsides of unwelcome use of AGI by others vs accidental catastrophe is critical to how AI companies and governments will decide how much risk of accidental catastrophe they will take, how AI researchers decide whether to bother with advance preparations, how much they will be willing to delay deployment for safety testing, etc.
Holden Karnofsky discusses this:
This is surprising to me! If I understand correctly, you would prefer to know for certain that P(doom) was (say) 10% than spend billions on reducing x-risks? (perhaps this comes down to a difference in our definitions of P(doom))
Like Dagon pointed out, it seems more useful to know how much you can change P(doom). For example, if we treat AI risk as a single hard step, going from 10% → 1% or 99% → 90% both increase the expected value of the future by 10X, it doesn’t matter much whether it started at 10% or 99%.
For prioritization within AI safety, are there projects in AI safety that you would stop funding as P(doom) goes from 1% to 10% to 99%? I personally would want to fund all the projects I could, regardless of P(doom) (with resources roughly proportional to how promising those projects are).
For prioritization across different risks, I think P(doom) is less important because I think AI is the only risk with greater than 1% chance of existential catastrophe. Maybe you have higher estimates for other risks and this is the crux?
In terms of institutional decision making, it seems like P(doom) > 1% is sufficient to determine the signs of different interventions. In a perfect world, a 1% chance of extinction would make researchers, companies, and governments very cautious, there would be no need to narrow down the range further.
Like Holden and Nathan point out, P(doom) does serve a promotional role by convincing people to focus more on AI risk, but getting more precise estimates of P(doom) isn’t necessarily the best way to convince people.
My understanding is that humanity is like a person stuck in a car whose brakes have failed, plummeting down a steep road towards a cliff. Timelines are about figuring out whether we have 1 minute to solve our dilemma or 10 days. The difference is very relevant indeed to what actions we strategically attempt in our limited time. Here’s my post about that: https://www.lesswrong.com/posts/wgcFStYwacRB8y3Yp/timelines-are-relevant-to-alignment-research-timelines-2-of P(doom), given the situation, is primarily about two questions: will our strategic action succeed? Will we survive the crash? I agree that, given we have accurate timelines and thus make correct strategic choices about which actions to attempt, the probability of our actions suceeding isn’t relevant. We must do our best. The probability that we will die if the car goes over the edge of the cliff is relevant only insofar as it motivates us to take action. We only need agreement that it is large enough to be worth worrying about, precision is irrelevant. Currently, lots of people who could potentially be taking helpful action are instead actively making the problem worse by working on capabilities. Talking about our reasoning for our personal estimates of p(doom) is useful if and only if it helps sway some potential safety researchers into working on safety, or some capabilities researchers into stopping work on capabilities.
Setting aside how important timelines are for strategy, the fact that P(doom) combines several questions together is a good point. Another way to decompose P(doom) is:
How likely are we to survive if we do nothing about the risk? Or perhaps: How likely are we to survive if we do alignment research at the current pace?
How much can we really reduce the risk with sustained effort? How immutable is the overall risk?
Though people probably mean different things by P(doom) and seems worthwhile to disentangle them.
Good point, P(doom) also serves a promotional role, in that it illustrates the size of the problem to others and potentially gets more people to work on alignment.
Precision beyond order-of-magnitude probably doesn’t matter. But there’s not much agreement on order-of-magnitude risks. Is it 1% in the next 10 years or next 80? Is it much over 1%, or closer to 0.1%. And is that conditional on other risks not materializing? Why would you give less than 1% to those other risks (my suspicion is you don’t think civilization collapse or 90% reduction in population is existential, which is debatable on it’s own).
And even if it IS the most important (but still very unlikely) risk, that doesn’t make it the one with the highest-EV to work on or donate to. You need to multiply by the amount of change you think you can have.
Yes, “precision beyond order-of-magnitude” is probably a better way to say what I was trying to.
I would go further and say that establishing P(doom) > 1% is sufficient to make AI the most important x-risk, because (like you point out), I don’t think there are other x-risks that have over a 1% chance of causing extinction (or permanent collapse). I don’t have this argument written up, but my reasoning mostly comes from the pieces I linked in addition to John Halstead’s research on the risks from climate change.
Agreed. I don’t know of any work that addresses this question directly by trying to estimate how much different projects can reduce P(doom) but would be very interested to read something like that. I also think P(doom) sort of contains this information but people seem to use different definitions.