Thanks for writing this. I’ve been thinking along similar lines since the pandemic started. Another takeaway for me: Under our current political system, AI risk will become politicized. It will be very easy for unaligned or otherwise dangerous AI to find human “allies” who will help to prevent effective social response. Given this, “more competent institutions” has to include large-scale and highly effective reforms to our democratic political structures, but political dysfunction is such a well-known problem (i.e., not particularly neglected) that if there were easy fixes, they would have been found and applied already.
So whereas you’re careful to condition your pessimism on “unless our institutions improve”, I’m just pessimistic. (To clarify, I was already pessimistic before COVID-19, so it just provided more details about how coordination/institutions are likely to fail, which I didn’t have a clear picture of. I’m curious if COVID-19 was an update for you as far as your overall assessment of AI risk. That wasn’t totally clear from the post.)
On a related note, I recall Paul said the risk from failure of AI alignment (I think he said or meant “intent alignment”) is 10%; Toby Ord gave a similar number for AI risk in his recent book; 80,000 Hours, based on interviews with multiple AI risk researchers, said “We estimate that the risk of a serious catastrophe caused by machine intelligence within the next 100 years is between 1 and 10%.” Until now 1-10% seems to have been the consensus view among the most prominent AI risk researchers. I wonder if that has changed due to recent events.
In September 2017, based on some conversations with MIRI and non-MIRI folks, I wrote:
I think that at least 80% of the AI safety researchers at MIRI, FHI, CHAI, OpenAI, and DeepMind would currently assign a >10% probability to this claim: “The research community will fail to solve one or more technical AI safety problems, and as a consequence there will be a permanent and drastic reduction in the amount of value in our future.”
People may have become more optimistic since then, but most people falling in the 1-10% range would still surprise me a lot. (Even excluding MIRI people, whose current probabilities I don’t know but who I think of as skewing pessimistic compared to other orgs.)
80,000 Hours, based on interviews with multiple AI risk researchers, said “We estimate that the risk of a serious catastrophe caused by machine intelligence within the next 100 years is between 1 and 10%.”
I complained to 80K about this back in 2017 too! :) I think 1-10% here was primarily meant to represent the median view of the 80,000 Hours team (or something to that effect), not the median view of AGI safety researchers. (Though obviously 80,000 Hours spent tons of time talking to safety researchers and taking those views into account. I just want to distinguish “this is our median view after talking to experts” from “this is our attempt to summarize experts’ median view after talking to experts”.)
Yeah, I’ve seen that photo before; I’m glad we have a record of this kind of thing! It doesn’t cause me to think that the thing I said in 2017 was false, though it suggests to me that most FHI staff overall in 2014 (like most 80K staff in 2017) probably would have assigned <10% probability to AGI-caused extinction (assuming there weren’t a bunch of FHI staff thinking “AGI is a lot more likely to cause non-extinction existential catastrophes” and/or “AGI has a decent chance of destroying the world, but we definitely won’t reach AGI this century”).
Then it is worth considering the majority of experts from the FHI to be extreme optimists, the same 20%? I really tried to find all the publicly available forecasts of experts and those who were confident that AI would lead to the extinction of humanity, there were very few among them. But I have no reason not to believe you or Luke Muehlhauser who introduced AI safety experts as even more confident pessimists: ’Many of them are, roughly speaking, 65%-85% confident that machine superintelligence will lead to human extinction’ . The reason may be that not everyone agrees, whose opinion is worth considering.
Thinking for a minute, I guess my unconditional probability of unaligned AI ending civilization (or something similar) is around 75%. It’s my default expected outcome.
That said, this isn’t a number I try to estimate directly very much, and I’m not sure if it would be the same after an hour of thinking about that number. Though I’d be surprised if I ended up giving more than 95% or less than 40%.
Thanks Wei! I agree that improving institutions is generally very hard. In a slow takeoff scenario, there would be a new path to improving institutions using powerful (but not fully general) AI, but it’s unclear how well we could expect that to work given the generally low priors.
The covid response was a minor update for me in terms of AI risk assessment—it was mildly surprising given my existing sense of institutional competence.
Thanks for writing this. I’ve been thinking along similar lines since the pandemic started. Another takeaway for me: Under our current political system, AI risk will become politicized. It will be very easy for unaligned or otherwise dangerous AI to find human “allies” who will help to prevent effective social response. Given this, “more competent institutions” has to include large-scale and highly effective reforms to our democratic political structures, but political dysfunction is such a well-known problem (i.e., not particularly neglected) that if there were easy fixes, they would have been found and applied already.
So whereas you’re careful to condition your pessimism on “unless our institutions improve”, I’m just pessimistic. (To clarify, I was already pessimistic before COVID-19, so it just provided more details about how coordination/institutions are likely to fail, which I didn’t have a clear picture of. I’m curious if COVID-19 was an update for you as far as your overall assessment of AI risk. That wasn’t totally clear from the post.)
On a related note, I recall Paul said the risk from failure of AI alignment (I think he said or meant “intent alignment”) is 10%; Toby Ord gave a similar number for AI risk in his recent book; 80,000 Hours, based on interviews with multiple AI risk researchers, said “We estimate that the risk of a serious catastrophe caused by machine intelligence within the next 100 years is between 1 and 10%.” Until now 1-10% seems to have been the consensus view among the most prominent AI risk researchers. I wonder if that has changed due to recent events.
In September 2017, based on some conversations with MIRI and non-MIRI folks, I wrote:
People may have become more optimistic since then, but most people falling in the 1-10% range would still surprise me a lot. (Even excluding MIRI people, whose current probabilities I don’t know but who I think of as skewing pessimistic compared to other orgs.)
I complained to 80K about this back in 2017 too! :) I think 1-10% here was primarily meant to represent the median view of the 80,000 Hours team (or something to that effect), not the median view of AGI safety researchers. (Though obviously 80,000 Hours spent tons of time talking to safety researchers and taking those views into account. I just want to distinguish “this is our median view after talking to experts” from “this is our attempt to summarize experts’ median view after talking to experts”.)
What about this and this? Here, some researchers at the FHI give other probabilities.
Yeah, I’ve seen that photo before; I’m glad we have a record of this kind of thing! It doesn’t cause me to think that the thing I said in 2017 was false, though it suggests to me that most FHI staff overall in 2014 (like most 80K staff in 2017) probably would have assigned <10% probability to AGI-caused extinction (assuming there weren’t a bunch of FHI staff thinking “AGI is a lot more likely to cause non-extinction existential catastrophes” and/or “AGI has a decent chance of destroying the world, but we definitely won’t reach AGI this century”).
Then it is worth considering the majority of experts from the FHI to be extreme optimists, the same 20%? I really tried to find all the publicly available forecasts of experts and those who were confident that AI would lead to the extinction of humanity, there were very few among them. But I have no reason not to believe you or Luke Muehlhauser who introduced AI safety experts as even more confident pessimists: ’Many of them are, roughly speaking, 65%-85% confident that machine superintelligence will lead to human extinction’ . The reason may be that not everyone agrees, whose opinion is worth considering.
Thinking for a minute, I guess my unconditional probability of unaligned AI ending civilization (or something similar) is around 75%. It’s my default expected outcome.
That said, this isn’t a number I try to estimate directly very much, and I’m not sure if it would be the same after an hour of thinking about that number. Though I’d be surprised if I ended up giving more than 95% or less than 40%.
Curious where yours is at?
I’m not Wei, but I think my estimate falls within that range as well.
Thanks Wei! I agree that improving institutions is generally very hard. In a slow takeoff scenario, there would be a new path to improving institutions using powerful (but not fully general) AI, but it’s unclear how well we could expect that to work given the generally low priors.
The covid response was a minor update for me in terms of AI risk assessment—it was mildly surprising given my existing sense of institutional competence.