Thanks for this post! I definitely disagree with you about point I (I think AI doom is 70% likely and I think people who think it is less than, say, 20% are being very unreasonable) but I appreciate the feedback and constructive criticism, especially section III.
If you ever want to chat sometime (e.g. in a comment thread, or in a video call) I’d be happy to. If you are especially interested I can reply here to your object-level arguments in section I. I guess a lightning version would be “My arguments for doom don’t depend on nanotech or anything possibly-impossible like that, only on things that seem clearly possible like ordinary persuasion, hacking, engineering, warfare, etc. As for what values ASI agents would have, indeed, they could end up just wanting to get low loss or even delete themselves or something like that. But if we are training them to complete ambitious tasks in the real world (and especially, if we are training them to have ambitious aligned goals like promoting human flourishing and avoiding long-term bad consequences), they’ll probably develop ambitious goals, and even if they don’t, that only buys us a little bit of time before someone creates one that does have ambitious goals. Finally, even goals that seem very unambitious can really become ambitious goals when a superintelligence has them, for galaxy-brained reasons which I can explain if you like. As for what happens after unaligned ASI takes over the world—agreed, it’s plausible they won’t kill us. But I think it’s safe to say that unaligned ASI taking over the world would be very bad in expectation and we should work hard to avoid it.”
As a minor nitpick, 70% likely and 20% are quite close in logodds space, so it seems odd you think what you believe is reasonable and something so close is “very unreasonable”.
I agree that logodds space is the right way to think about how close probabilities are. However, my epistemic situation right now is basically this:
”It sure seems like Doom is more likely than Safety, for a bunch of reasons. However, I feel sufficiently uncertain about stuff, and humble, that I don’t want to say e.g. 99% chance of doom, or even 90%. I can in fact imagine things being OK, in a couple different ways, even if those ways seem unlikely to me. … OK, now if I imagine someone having the flipped perspective, and thinking that things being OK is more likely than doom, but being humble and thinking that they should assign at least 10% credence (but less than 20%) to doom… I’d be like “what are you smoking? What world are you living in, where it seems like things will be fine by default but there are a few unlikely ways things could go badly, instead of a world where it seems like things will go badly by default but there are a few unlikely ways things could go well? I mean I can see how you’d think this is you weren’t aware of how short timelines to ASI are, or if you hadn’t thought much about the alignment problem...”
If you think this is unreasonable, I’d be interested to hear it!
I don’t think the way you imagine perspective inversion captures typical ways how to arrive at e.g. 20% doom probability. For example, I do believe that there are multiple good things which can happen/be true, decrease p(doom) and I put some weight on them - we do discover some relatively short description of something like “harmony and kindness”; this works as an alignment target - enough of morality is convergent - AI progress helps with human coordination (could be in costly way, eg warning shot) - it’s convergent to massively scale alignment efforts with AI power, and these solve some of the more obvious problems
I would expect prevailing doom conditional on only small efforts to avoid it, but I do think the actual efforts will be substantial, and this moves the chances to ~20-30%. (Also I think most of the risk comes from not being able to deal with complex systems of many AIs and economy decoupling from humans, and single-single alignment to be solved sufficiently to prevent single system takeover by default.)
Thanks for this comment. I’d be generally interested to hear more about how one could get to 20% doom (or less).
The list you give above is cool but doesn’t do it for me; going down the list I’d guess something like: 1. 20% likely (honesty seems like the best bet to me) because we have so little time left, but even if it happens we aren’t out of the woods yet because there are various plausible ways we could screw things up. So maybe overall this is where 1/3rd of my hope comes from. 2. 5% likely? Would want to think about this more. I could imagine myself being very wrong here actually, I haven’t thought about it enough. But it sure does sound like wishful thinking. 3. This is already happening to some extent, but the question is, will it happen enough? My overall “humans coordinate to not build the dangerous kinds of AI for several years, long enough to figure out how to end the acute risk period” is where most of my hope comes from. I guess it’s the remaining 2/3rds basically. So, I guess I can say 20% likely. 4. What does this mean?
I would be much more optimistic if I thought timelines were longer.
This seems to violate common sense. Why would you think about this in log space? 99% and 1% are identical in if(>0) space, but they have massively different implications for how you think about a risk (just like 20 and 70% do!)
It’s much more natural way how to think about it (cf eg TE Janes, Probability theory, examples in Chapter IV)
In this specific case of evaluating hypothesis, the distance in the logodds space indicates the strength the evidence you would need to see to update. Close distance implies you don’t that much evidence to update between the positions (note the distance between 0.7 and 0.2 is closer than 0.9 and 0.99). If you need only a small amount of evidence to update, it is easy to imagine some other observer as reasonable as you had accumulated a bit or two somewhere you haven’t seen.
Because working in logspace is way more natural, it is almost certainly also what our brains do—the “common sense” is almost certainly based on logspace representations.
I seem to remember your P(doom) being 85% a short while ago. I’d be interested to know why it has dropped to 70%, or in another way of looking at it, why you believe our odds of non-doom have doubled.
Whereas my timelines views are extremely well thought-through (relative to most people that is) I feel much more uncertain and unstable about p(doom). That said, here’s why I updated:
Hinton and Bengio have come out as worried about AGI x-risk; the FLI letter and Yudkowsky’s tour of podcasts, while incompetently executed, have been better received by the general public and elites than I expected; the big labs (especially OpenAI) have reiterated that superintelligent AGI is a thing, that it might come soon, that it might kill everyone, and that regulation is needed; internally, OpenAI at least has pushed more for focus on these big issues as well. Oh and there’s been some cool progress in interpretability & alignment which doesn’t come close to solving the problem on its own but makes me optimistic that we aren’t barking up the wrong trees / completely hitting a wall. (I’m thinking about e.g. the cheese vector and activation vector stuff and the discovering latent knowledge stuff)
As for capabilities, yes it’s bad that tons of people are now experimenting with AutoGPT and making their own LLM startups, and it’s bad that Google DeepMind is apparently doing some AGI mega-project, but… those things were already priced in, by me at least. I fully expected the other big corporations to ‘wake up’ at some point and start racing hard, and the capabilities we’ve seen so far are pretty much exactly on trend for my What 2026 Looks Like scenario which involved AI takeover in 2027 and singularity in 2028.
Basically, I feel like we are on track to rule out one of the possible bad futures (in which the big corporations circle the wagons and say AGI is Safe there is No Evidence of Danger the AI x-risk people are Crazy Fanatics and the government buys their story long enough for it to be too late.) Now unfortunately the most likely bad future remains, in which the government does implement some regulation intended to fix the problem, but it fails to fix the problem & fails to buy us any significant amount of time before the dangerous sorts of AGI are built and deployed. (e.g. because it gets watered down by tech companies averse to abandoning profitable products and lines of research, e.g. because racing with China causes everyone to go ‘well actually’ when the time comes to slow down and change course)
Meanwhile one of the good futures (in which the regulation is good and succeeds in preventing people from building the bad kinds of AGI for years, buying us time in which to do more alignment, interpretability, and governance work, and for the world to generally get more awareness and focus on the problems) is looking somewhat more likely.
So I still think we are on a default path to doom but one of the plausible bad futures seems less likely and one of the plausible good futures seems more likely. So yeah.
Thanks for this. I was just wondering how your views have updated in light of recent events.
Like you I also think that things are going better than my median prediction, but paradoxically I’ve been feeling even more pessimistic lately. Reflecting on this, I think my p(doom) has gone up instead of down, because some of the good futures where a lot of my probability mass for non-doom were concentrated have also disappeared, which seems to outweigh the especially bad futures going away and makes me overall more pessimistic.
These especially good futures were 1) AI capabilities hit a wall before getting to human level and 2) humanity handles AI risk especially competently, e.g., at this stage leading AI labs talk clearly about existential risks in their public communications and make serious efforts to avoid race dynamics, there is more competent public discussion of takeover risk than what we see today including fully cooked regulatory proposals, many people start taking less obvious (non-takeover) AI-related x-risks (like ones Paul mentions in this post) seriously.
Thank you for the reply. I agree we should try and avoid AI taking over the world.
On “doom through normal means”—I just think there are very plausibly limits to what superintelligence can do. “Persuasion, hacking, and warfare” (appreciate this is not a full version of the argument) don’t seem like doom to me. I don’t believe something can persuade generals to go to war in a short period of time, just because it’s very intelligent. Reminds me of this.
On values—I think there’s a conflation between us having ambitious goals, and whatever is actually being optimized by the AI. I am curious to hear what the “galaxy brained reasons” are; my impression was, they are what was outlined (and addressed) in the original post.
a superintelligence will be at least several orders of magnitude more persuasive than character.ai or Stuart Armstrong.
Believing this seems central to believing high P(doom).
But, I think it’s not a coherent enough concept to justify believing it. Yes, some people are far more persuasive than others. But how can you extrapolate that far beyond the distribution we obverse in humans? I do think AI will prove to better than humans at this, and likely muchbetter.
But “much” better isn’t the same as “better enough to be effectively treated as magic”.
Well, even the tail of the human distribution is pretty scary. A single human with a lot of social skills can become the leader of a whole nation, or even a prophet considered literally a divine being. This has already happened several times in history, even in times where you had to be physically close to people to convince them.
On doom through normal means: “Persuasion, hacking, and warfare” aren’t by themselves doom, but they can be used to accumulate lots of power, and then that power can be used to cause doom. Imagine a world in which human are completely economically, militarily, and politically obsolete, thanks to armies of robots directed by superintelligent AIs. Such a world could and would do very nasty things to humans (e.g. let them all starve to death) unless the superintelligent AIs managing everything specifically cared about keeping humans alive and in good living conditions. Because keeping humans alive & in good living conditions would, ex hypothesi, not be instrumentally valuable to the economy, or the military, etc.
How could such a world arise? Well, if we have superintelligent AIs, they can do some hacking, persuasion, and maybe some warfare, and create that world.
How long would this process take? IDK, maybe years? Could be much less. But I wouldn’t be surprised if it takes several years, even maybe five years.
I’m not conflating those things. We have ambitious goals and are trying to get our AIs to have ambitious goals—specifically we are trying to get them to have our ambitious goals. It’s not much of a stretch to imagine this going wrong, and them ending up with ambitious goals that are different from ours in various ways (even if somewhat overlapping).
Remember that persuasion from an ASI doesn’t need to look like “text-based chatting with a human.” It includes all the tools of communication available. Actually-near-flawless forgeries of any and every form of digital data you could ever ask for, as a baseline, all based on the best possible inferences made from all available real data.
How many people today are regularly persuaded of truly ridiculous things by perfectly normal human-scale-intelligent scammers, cults, conspiracy theorists, marketers, politicians, relatives, preachers, and so on? The average human, even the average IQ 120-150 human, just isn’t that resistant to persuasion in favor of untrue claims.
Thanks for this post! I definitely disagree with you about point I (I think AI doom is 70% likely and I think people who think it is less than, say, 20% are being very unreasonable) but I appreciate the feedback and constructive criticism, especially section III.
If you ever want to chat sometime (e.g. in a comment thread, or in a video call) I’d be happy to. If you are especially interested I can reply here to your object-level arguments in section I. I guess a lightning version would be “My arguments for doom don’t depend on nanotech or anything possibly-impossible like that, only on things that seem clearly possible like ordinary persuasion, hacking, engineering, warfare, etc. As for what values ASI agents would have, indeed, they could end up just wanting to get low loss or even delete themselves or something like that. But if we are training them to complete ambitious tasks in the real world (and especially, if we are training them to have ambitious aligned goals like promoting human flourishing and avoiding long-term bad consequences), they’ll probably develop ambitious goals, and even if they don’t, that only buys us a little bit of time before someone creates one that does have ambitious goals. Finally, even goals that seem very unambitious can really become ambitious goals when a superintelligence has them, for galaxy-brained reasons which I can explain if you like. As for what happens after unaligned ASI takes over the world—agreed, it’s plausible they won’t kill us. But I think it’s safe to say that unaligned ASI taking over the world would be very bad in expectation and we should work hard to avoid it.”
As a minor nitpick, 70% likely and 20% are quite close in logodds space, so it seems odd you think what you believe is reasonable and something so close is “very unreasonable”.
I agree that logodds space is the right way to think about how close probabilities are. However, my epistemic situation right now is basically this:
”It sure seems like Doom is more likely than Safety, for a bunch of reasons. However, I feel sufficiently uncertain about stuff, and humble, that I don’t want to say e.g. 99% chance of doom, or even 90%. I can in fact imagine things being OK, in a couple different ways, even if those ways seem unlikely to me. … OK, now if I imagine someone having the flipped perspective, and thinking that things being OK is more likely than doom, but being humble and thinking that they should assign at least 10% credence (but less than 20%) to doom… I’d be like “what are you smoking? What world are you living in, where it seems like things will be fine by default but there are a few unlikely ways things could go badly, instead of a world where it seems like things will go badly by default but there are a few unlikely ways things could go well? I mean I can see how you’d think this is you weren’t aware of how short timelines to ASI are, or if you hadn’t thought much about the alignment problem...”
If you think this is unreasonable, I’d be interested to hear it!
I don’t think the way you imagine perspective inversion captures typical ways how to arrive at e.g. 20% doom probability. For example, I do believe that there are multiple good things which can happen/be true, decrease p(doom) and I put some weight on them
- we do discover some relatively short description of something like “harmony and kindness”; this works as an alignment target
- enough of morality is convergent
- AI progress helps with human coordination (could be in costly way, eg warning shot)
- it’s convergent to massively scale alignment efforts with AI power, and these solve some of the more obvious problems
I would expect prevailing doom conditional on only small efforts to avoid it, but I do think the actual efforts will be substantial, and this moves the chances to ~20-30%. (Also I think most of the risk comes from not being able to deal with complex systems of many AIs and economy decoupling from humans, and single-single alignment to be solved sufficiently to prevent single system takeover by default.)
Thanks for this comment. I’d be generally interested to hear more about how one could get to 20% doom (or less).
The list you give above is cool but doesn’t do it for me; going down the list I’d guess something like:
1. 20% likely (honesty seems like the best bet to me) because we have so little time left, but even if it happens we aren’t out of the woods yet because there are various plausible ways we could screw things up. So maybe overall this is where 1/3rd of my hope comes from.
2. 5% likely? Would want to think about this more. I could imagine myself being very wrong here actually, I haven’t thought about it enough. But it sure does sound like wishful thinking.
3. This is already happening to some extent, but the question is, will it happen enough? My overall “humans coordinate to not build the dangerous kinds of AI for several years, long enough to figure out how to end the acute risk period” is where most of my hope comes from. I guess it’s the remaining 2/3rds basically. So, I guess I can say 20% likely.
4. What does this mean?
I would be much more optimistic if I thought timelines were longer.
This seems to violate common sense. Why would you think about this in log space? 99% and 1% are identical in if(>0) space, but they have massively different implications for how you think about a risk (just like 20 and 70% do!)
It’s much more natural way how to think about it (cf eg TE Janes, Probability theory, examples in Chapter IV)
In this specific case of evaluating hypothesis, the distance in the logodds space indicates the strength the evidence you would need to see to update. Close distance implies you don’t that much evidence to update between the positions (note the distance between 0.7 and 0.2 is closer than 0.9 and 0.99). If you need only a small amount of evidence to update, it is easy to imagine some other observer as reasonable as you had accumulated a bit or two somewhere you haven’t seen.
Because working in logspace is way more natural, it is almost certainly also what our brains do—the “common sense” is almost certainly based on logspace representations.
I seem to remember your P(doom) being 85% a short while ago. I’d be interested to know why it has dropped to 70%, or in another way of looking at it, why you believe our odds of non-doom have doubled.
Whereas my timelines views are extremely well thought-through (relative to most people that is) I feel much more uncertain and unstable about p(doom). That said, here’s why I updated:
Hinton and Bengio have come out as worried about AGI x-risk; the FLI letter and Yudkowsky’s tour of podcasts, while incompetently executed, have been better received by the general public and elites than I expected; the big labs (especially OpenAI) have reiterated that superintelligent AGI is a thing, that it might come soon, that it might kill everyone, and that regulation is needed; internally, OpenAI at least has pushed more for focus on these big issues as well. Oh and there’s been some cool progress in interpretability & alignment which doesn’t come close to solving the problem on its own but makes me optimistic that we aren’t barking up the wrong trees / completely hitting a wall. (I’m thinking about e.g. the cheese vector and activation vector stuff and the discovering latent knowledge stuff)
As for capabilities, yes it’s bad that tons of people are now experimenting with AutoGPT and making their own LLM startups, and it’s bad that Google DeepMind is apparently doing some AGI mega-project, but… those things were already priced in, by me at least. I fully expected the other big corporations to ‘wake up’ at some point and start racing hard, and the capabilities we’ve seen so far are pretty much exactly on trend for my What 2026 Looks Like scenario which involved AI takeover in 2027 and singularity in 2028.
Basically, I feel like we are on track to rule out one of the possible bad futures (in which the big corporations circle the wagons and say AGI is Safe there is No Evidence of Danger the AI x-risk people are Crazy Fanatics and the government buys their story long enough for it to be too late.) Now unfortunately the most likely bad future remains, in which the government does implement some regulation intended to fix the problem, but it fails to fix the problem & fails to buy us any significant amount of time before the dangerous sorts of AGI are built and deployed. (e.g. because it gets watered down by tech companies averse to abandoning profitable products and lines of research, e.g. because racing with China causes everyone to go ‘well actually’ when the time comes to slow down and change course)
Meanwhile one of the good futures (in which the regulation is good and succeeds in preventing people from building the bad kinds of AGI for years, buying us time in which to do more alignment, interpretability, and governance work, and for the world to generally get more awareness and focus on the problems) is looking somewhat more likely.
So I still think we are on a default path to doom but one of the plausible bad futures seems less likely and one of the plausible good futures seems more likely. So yeah.
Thanks for this. I was just wondering how your views have updated in light of recent events.
Like you I also think that things are going better than my median prediction, but paradoxically I’ve been feeling even more pessimistic lately. Reflecting on this, I think my p(doom) has gone up instead of down, because some of the good futures where a lot of my probability mass for non-doom were concentrated have also disappeared, which seems to outweigh the especially bad futures going away and makes me overall more pessimistic.
These especially good futures were 1) AI capabilities hit a wall before getting to human level and 2) humanity handles AI risk especially competently, e.g., at this stage leading AI labs talk clearly about existential risks in their public communications and make serious efforts to avoid race dynamics, there is more competent public discussion of takeover risk than what we see today including fully cooked regulatory proposals, many people start taking less obvious (non-takeover) AI-related x-risks (like ones Paul mentions in this post) seriously.
Makes sense. I had basically decided by 2021 that those good futures (1) and (2) were very unlikely, so yeah.
Thank you for the reply. I agree we should try and avoid AI taking over the world.
On “doom through normal means”—I just think there are very plausibly limits to what superintelligence can do. “Persuasion, hacking, and warfare” (appreciate this is not a full version of the argument) don’t seem like doom to me. I don’t believe something can persuade generals to go to war in a short period of time, just because it’s very intelligent. Reminds me of this.
On values—I think there’s a conflation between us having ambitious goals, and whatever is actually being optimized by the AI. I am curious to hear what the “galaxy brained reasons” are; my impression was, they are what was outlined (and addressed) in the original post.
A few things I’ve seen give pretty worrying lower bounds for how persuasive a superintelligence would be:
How it feels to have your mind hacked by an AI
The AI in a box boxes you (content warning: creepy blackmail-y acausal stuff)
Remember that a superintelligence will be at least several orders of magnitude more persuasive than character.ai or Stuart Armstrong.
Believing this seems central to believing high P(doom).
But, I think it’s not a coherent enough concept to justify believing it. Yes, some people are far more persuasive than others. But how can you extrapolate that far beyond the distribution we obverse in humans? I do think AI will prove to better than humans at this, and likely much better.
But “much” better isn’t the same as “better enough to be effectively treated as magic”.
Well, even the tail of the human distribution is pretty scary. A single human with a lot of social skills can become the leader of a whole nation, or even a prophet considered literally a divine being. This has already happened several times in history, even in times where you had to be physically close to people to convince them.
Thanks to you likewise!
On doom through normal means: “Persuasion, hacking, and warfare” aren’t by themselves doom, but they can be used to accumulate lots of power, and then that power can be used to cause doom. Imagine a world in which human are completely economically, militarily, and politically obsolete, thanks to armies of robots directed by superintelligent AIs. Such a world could and would do very nasty things to humans (e.g. let them all starve to death) unless the superintelligent AIs managing everything specifically cared about keeping humans alive and in good living conditions. Because keeping humans alive & in good living conditions would, ex hypothesi, not be instrumentally valuable to the economy, or the military, etc.
How could such a world arise? Well, if we have superintelligent AIs, they can do some hacking, persuasion, and maybe some warfare, and create that world.
How long would this process take? IDK, maybe years? Could be much less. But I wouldn’t be surprised if it takes several years, even maybe five years.
I’m not conflating those things. We have ambitious goals and are trying to get our AIs to have ambitious goals—specifically we are trying to get them to have our ambitious goals. It’s not much of a stretch to imagine this going wrong, and them ending up with ambitious goals that are different from ours in various ways (even if somewhat overlapping).
Remember that persuasion from an ASI doesn’t need to look like “text-based chatting with a human.” It includes all the tools of communication available. Actually-near-flawless forgeries of any and every form of digital data you could ever ask for, as a baseline, all based on the best possible inferences made from all available real data.
How many people today are regularly persuaded of truly ridiculous things by perfectly normal human-scale-intelligent scammers, cults, conspiracy theorists, marketers, politicians, relatives, preachers, and so on? The average human, even the average IQ 120-150 human, just isn’t that resistant to persuasion in favor of untrue claims.