I’d be interested to hear what what size of delay you used, and what your reasoning for that was.
I didn’t think very hard about it and just eyeballed the graph. Probably a majority of “negligible on this scale” and a minority of “years or (less likely) decades” if we’ve defined AGI too loosely and the first AGI isn’t a huge deal, or things go slowly for some other reason.
Was your main input into this parameter your perceptions of what other people would believe about this parameter?
Yes, but only because those other people seem to make reasonable arguments, so that’s kind of like believing it because of the arguments instead of the people. Some vague model of the world is probably also involved, like “avoiding AI x-risk seems like a really hard problem but it’s probably doable with enough effort and increasingly many people are taking it very seriously”.
If so, I’d be interested to hear whose beliefs you perceive yourself to be deferring to here.
MIRI people and Wei Dai for pessimism (though I’m not sure it’s their view that it’s worse than 50⁄50), Paul Christiano and other researchers for optimism.
MIRI people and Wei Dai for pessimism (though I’m not sure it’s their view that it’s worse than 50⁄50), Paul Christiano and other researchers for optimism.
It does seem odd to me that, if you aimed to do something like average over these people’s views (or maybe taking a weighted average, weighting based on the perceived reasonableness of their arguments), you’d end up with a 50% credence on existential catastrophe from AI. (Although now I notice you actually just said “weight it by the probability that it turns out badly instead of well”; I’m assuming by that you mean “the probability that it results in existential catastrophe”, but feel free to correct me if not.)
One MIRI person (Buck Schlegris) has indicated they think there’s a 50% chance of that. One other MIRI-adjacent person gives estimates for similar outcomes in the range of 33-50%. I’ve also got general pessimistic vibes from other MIRI people’s writings, but I’m not aware of any other quantitative estimates from them or from Wei Dai. So my point estimate for what MIRI people think would be around 40-50%, and not well above 50%.
And I think MIRI is widely perceived as unusually pessimistic (among AI and x-risk researchers; not necessarily among LessWrong users). And people like Paul Christiano give something more like a 10% chance of existential catastrophe from AI. (Precisely what he was estimating was a little different, but similar.)
So averaging across these views would seem to give us something closer to 30%.
Personally, I’d also probably include various other people who seem thoughtful on this and are actively doing AI or x-risk research—e.g., Rohin Shah, Toby Ord—and these people’s estimates seem to usually be closer to Paul than to MIRI (see also). But arguing for doing that would be arguing for a different reasoning process, and I’m very happy with you using your independent judgement to decide who to defer to; I intend this comment to instead just express confusion about how your stated process reached your stated output.
(I’m getting these estimates from my database of x-risk estimates. I’m also being slightly vague because I’m still feeling a pull to avoid explicitly mentioning other views and thereby anchoring this thread.)
(I should also note that I’m not at all saying to not worry about AI—something like a 10% risk is still a really big deal!)
Yes, maybe I should have used 40% instead of 50%. I’ve seen Paul Christiano say 10-20% elsewhere. Shah and Ord are part of whom I meant by “other researchers”. I’m not sure which of these estimates are conditional on superintelligence being invented. To the extent that they’re not, and to the extent that people think superintelligence may not be invented, that means they understate the conditional probability that I’m using here. I think lowish estimates of disaster risks might be more visible than high estimates because of something like social desirability, but who knows.
I’m not sure which of these estimates are conditional on superintelligence being invented. To the extent that they’re not, and to the extent that people think superintelligence may not be invented, that means they understate the conditional probability that I’m using here.
Good point. I’d overlooked that.
I think lowish estimates of disaster risks might be more visible than high estimates because of something like social desirability, but who knows.
(I think it’s good to be cautious about bias arguments, so take the following with a grain of salt, and note that I’m not saying any of these biases are necessarily the main factor driving estimates. I raise the following points only because the possibility of bias has already been mentioned.)
I think social desirability bias could easily push the opposite way as well, especially if we’re including non-academics who dedicate their jobs or much of their time to x-risks (which I think covers the people you’re considering, except that Rohin is sort-of in academia). I’d guess the main people listening to these people’s x-risk estimates are other people who think x-risks are a big deal, and higher x-risk estimates would tend to make such people feel more validated in their overall interests and beliefs.
I can see how something like a bias towards saying things that people take seriously and that don’t seem crazy (which is perhaps a form of social desirability bias) could also push estimates down. I’d guess that that that effect is stronger the closer one gets to academia or policy. I’m not sure what the net effect of the social desirability bias type stuff would be on people like MIRI, Paul, and Rohin.
I’d guess that the stronger bias would be selection effects in who even makes these estimates. I’d guess that people who work on x-risks have higher x-risk estimates than people who don’t and who have thought about odds of x-risk somewhat explicitly. (I think a lot of people just wouldn’t have even a vague guess in mind, and could swing from casually saying extinction is likely in the next few decades to seeing that idea as crazy depending on when you ask them.)
Quantitative x-risk estimates tend to come from the first group, rather than the latter, because the first group cares enough to bother to estimate this. And we’d be less likely to pay attention to estimates from the latter group anyway, if they existed, because they don’t seem like experts—they haven’t spent much time thinking about the issue. But they haven’t spent much time thinking about it because they don’t think the risk is high, so we’re effectively selecting who to listen to the estimates of based in part on what their estimates would be.
I’d still do similar myself—I’d pay attention to the x-risk “experts” rather than other people. And I don’t think we need to massively adjust our own estimates in light of this. But this does seem like a reason to expect the estimates are biased upwards, compared to the estimates we’d get from a similarly intelligent and well-informed group of people who haven’t been pre-selected for a predisposition to think the risk is somewhat high.
Mostly I only start paying attention to people’s opinions on these things once they’ve demonstrated that they can reason seriously about weird futures, and I don’t think I know of any person who’s demonstrated this who thinks risk is under, say, 10%. (edit: though I wonder if Robin Hanson counts)
I don’t think I know of any person who’s demonstrated this who thinks risk is under, say, 10%
If you mean risk of extinction or existential catastrophe from AI at the time AI is developed, it seems really hard to say, as I think that that’s been estimated even less often than other aspects of AI risk (e.g. risk this century) or x-risk as a whole.
I think the only people (maybe excluding commenters who don’t work on this professionally) who’ve clearly given a greater than 10% estimate for this are:
Buck Schlegris (50%)
Stuart Armstrong (33-50% chance humanity doesn’t survive AI)
Toby Ord (10% existential riskfrom AI this century, but 20% for when the AI transition happens)
Meanwhile, people who I think have effectively given <10% estimates for that (judging from estimates that weren’t conditioning on when AI was developed; all from my database):
Very likely MacAskill (well below 10% for extinction as a whole in the 21st century)
Very likely Ben Garfinkel (0-1% x-catastrophe from AI this century)
Probably the median FHI 2008 survey respondent (5% for AI extinction in the 21st century)
Probably Pamlin & Armstrong in a report (0-10% for unrecoverable collapse extinction from AI this century)
But then Armstrong separately gave a higher estimate
And I haven’t actually read the Pamlin & Armstrong report
Maybe Rohin Shah (some estimates in a comment thread)
(Maybe Hanson would also give <10%, but I haven’t seen explicit estimates from him, and his reduced focus on and “doominess” from AI may be because he thinks timelines are longer and other things may happen first.)
I’d personally consider all the people I’ve listed to have demonstrated at least a fairly good willingness and ability to reason seriously about the future, though there’s perhaps room for reasonable disagreement here. (With the caveat that I don’t know Pamlin and don’t know precisely who was in the FHI survey.)
Mostly I only start paying attention to people’s opinions on these things once they’ve demonstrated that they can reason seriously about weird futures
[tl;dr This is an understandable thing to do, but does seem to result in biasing one’s sample towards higher x-risk estimates]
I can see the appeal of that principle. I partly apply such a principle myself (though in the form of giving less weight to some opinions, not ruling them out).
But what if it turns out the future won’t be weird in the ways you’re thinking of? Or what if it turns out that, even if it will be weird in those ways, influencing it is too hard, or just isn’t very urgent (i.e., the “hinge of history” is far from now), or is already too likely to turn out well “by default” (perhaps because future actors will also have mostly good intentions and will be more informed).
Under such conditions, it might be that the smartest people with the best judgement won’t demonstrate that they can reason seriously about weird futures, even if they hypothetically could, because it’s just not worth their time to do so. In the same way as how I haven’t demonstrated my ability to reason seriously about tax policy, because I think reasoning seriously about the long-term future is a better use of my time. Someone who starts off believing tax policy is an overwhelmingly big deal could then say “Well, Michael thinks the long-term future is what we should focus on instead, but how why should I trust Michael’s view on that when he hasn’t demonstrated he can reason seriously about the importance and consequences of tax policy?”
(I think I’m being inspired here by Trammell’s interested posting “But Have They Engaged With The Arguments?” There’s some LessWrong discussion—which I haven’t read—of an early version here.)
I in fact do believe we should focus on long-term impacts, and am dedicating my career to doing so, as influencing the long-term future seems sufficiently likely to be tractable, urgent, and important. But I think there are reasonable arguments against each of those claims, and I wouldn’t be very surprised if they turned out to all be wrong. (But I think currently we’ve only had a very small part of humanity working intensely and strategically on this topic for just ~15 years, so it would seem too early to assume there’s nothing we can usefully do here.)
And if so, it would be better to try to improve the short-term future, which further future people can’t help us with, and then it would make sense for the smart people with good judgement to not demonstrate their ability to think seriously about the long-term future. So under such conditions, the people left in the sample you pay attention to aren’t the smartest people with the best judgement, and are skewed towards unreasonably high estimates of the tractability, urgency, and/or importance of influencing the long-term future.
To emphasise: I really do want way more work on existential risks and longtermism more broadly! And I do think that, when it comes to those topics, we should pay more attention to “experts” who’ve thought a lot about those topics than to other people (even if we shouldn’t only pay attention to them). I just want us to be careful about things like echo chamber effects and biasing the sample of opinions we listen to.
I didn’t think very hard about it and just eyeballed the graph. Probably a majority of “negligible on this scale” and a minority of “years or (less likely) decades” if we’ve defined AGI too loosely and the first AGI isn’t a huge deal, or things go slowly for some other reason.
Yes, but only because those other people seem to make reasonable arguments, so that’s kind of like believing it because of the arguments instead of the people. Some vague model of the world is probably also involved, like “avoiding AI x-risk seems like a really hard problem but it’s probably doable with enough effort and increasingly many people are taking it very seriously”.
MIRI people and Wei Dai for pessimism (though I’m not sure it’s their view that it’s worse than 50⁄50), Paul Christiano and other researchers for optimism.
Thanks for those responses :)
It does seem odd to me that, if you aimed to do something like average over these people’s views (or maybe taking a weighted average, weighting based on the perceived reasonableness of their arguments), you’d end up with a 50% credence on existential catastrophe from AI. (Although now I notice you actually just said “weight it by the probability that it turns out badly instead of well”; I’m assuming by that you mean “the probability that it results in existential catastrophe”, but feel free to correct me if not.)
One MIRI person (Buck Schlegris) has indicated they think there’s a 50% chance of that. One other MIRI-adjacent person gives estimates for similar outcomes in the range of 33-50%. I’ve also got general pessimistic vibes from other MIRI people’s writings, but I’m not aware of any other quantitative estimates from them or from Wei Dai. So my point estimate for what MIRI people think would be around 40-50%, and not well above 50%.
And I think MIRI is widely perceived as unusually pessimistic (among AI and x-risk researchers; not necessarily among LessWrong users). And people like Paul Christiano give something more like a 10% chance of existential catastrophe from AI. (Precisely what he was estimating was a little different, but similar.)
So averaging across these views would seem to give us something closer to 30%.
Personally, I’d also probably include various other people who seem thoughtful on this and are actively doing AI or x-risk research—e.g., Rohin Shah, Toby Ord—and these people’s estimates seem to usually be closer to Paul than to MIRI (see also). But arguing for doing that would be arguing for a different reasoning process, and I’m very happy with you using your independent judgement to decide who to defer to; I intend this comment to instead just express confusion about how your stated process reached your stated output.
(I’m getting these estimates from my database of x-risk estimates. I’m also being slightly vague because I’m still feeling a pull to avoid explicitly mentioning other views and thereby anchoring this thread.)
(I should also note that I’m not at all saying to not worry about AI—something like a 10% risk is still a really big deal!)
Yes, maybe I should have used 40% instead of 50%. I’ve seen Paul Christiano say 10-20% elsewhere. Shah and Ord are part of whom I meant by “other researchers”. I’m not sure which of these estimates are conditional on superintelligence being invented. To the extent that they’re not, and to the extent that people think superintelligence may not be invented, that means they understate the conditional probability that I’m using here. I think lowish estimates of disaster risks might be more visible than high estimates because of something like social desirability, but who knows.
Good point. I’d overlooked that.
(I think it’s good to be cautious about bias arguments, so take the following with a grain of salt, and note that I’m not saying any of these biases are necessarily the main factor driving estimates. I raise the following points only because the possibility of bias has already been mentioned.)
I think social desirability bias could easily push the opposite way as well, especially if we’re including non-academics who dedicate their jobs or much of their time to x-risks (which I think covers the people you’re considering, except that Rohin is sort-of in academia). I’d guess the main people listening to these people’s x-risk estimates are other people who think x-risks are a big deal, and higher x-risk estimates would tend to make such people feel more validated in their overall interests and beliefs.
I can see how something like a bias towards saying things that people take seriously and that don’t seem crazy (which is perhaps a form of social desirability bias) could also push estimates down. I’d guess that that that effect is stronger the closer one gets to academia or policy. I’m not sure what the net effect of the social desirability bias type stuff would be on people like MIRI, Paul, and Rohin.
I’d guess that the stronger bias would be selection effects in who even makes these estimates. I’d guess that people who work on x-risks have higher x-risk estimates than people who don’t and who have thought about odds of x-risk somewhat explicitly. (I think a lot of people just wouldn’t have even a vague guess in mind, and could swing from casually saying extinction is likely in the next few decades to seeing that idea as crazy depending on when you ask them.)
Quantitative x-risk estimates tend to come from the first group, rather than the latter, because the first group cares enough to bother to estimate this. And we’d be less likely to pay attention to estimates from the latter group anyway, if they existed, because they don’t seem like experts—they haven’t spent much time thinking about the issue. But they haven’t spent much time thinking about it because they don’t think the risk is high, so we’re effectively selecting who to listen to the estimates of based in part on what their estimates would be.
I’d still do similar myself—I’d pay attention to the x-risk “experts” rather than other people. And I don’t think we need to massively adjust our own estimates in light of this. But this does seem like a reason to expect the estimates are biased upwards, compared to the estimates we’d get from a similarly intelligent and well-informed group of people who haven’t been pre-selected for a predisposition to think the risk is somewhat high.
Mostly I only start paying attention to people’s opinions on these things once they’ve demonstrated that they can reason seriously about weird futures, and I don’t think I know of any person who’s demonstrated this who thinks risk is under, say, 10%. (edit: though I wonder if Robin Hanson counts)
If you mean risk of extinction or existential catastrophe from AI at the time AI is developed, it seems really hard to say, as I think that that’s been estimated even less often than other aspects of AI risk (e.g. risk this century) or x-risk as a whole.
I think the only people (maybe excluding commenters who don’t work on this professionally) who’ve clearly given a greater than 10% estimate for this are:
Buck Schlegris (50%)
Stuart Armstrong (33-50% chance humanity doesn’t survive AI)
Toby Ord (10% existential risk from AI this century, but 20% for when the AI transition happens)
Meanwhile, people who I think have effectively given <10% estimates for that (judging from estimates that weren’t conditioning on when AI was developed; all from my database):
Very likely MacAskill (well below 10% for extinction as a whole in the 21st century)
Very likely Ben Garfinkel (0-1% x-catastrophe from AI this century)
Probably the median FHI 2008 survey respondent (5% for AI extinction in the 21st century)
Probably Pamlin & Armstrong in a report (0-10% for unrecoverable collapse extinction from AI this century)
But then Armstrong separately gave a higher estimate
And I haven’t actually read the Pamlin & Armstrong report
Maybe Rohin Shah (some estimates in a comment thread)
(Maybe Hanson would also give <10%, but I haven’t seen explicit estimates from him, and his reduced focus on and “doominess” from AI may be because he thinks timelines are longer and other things may happen first.)
I’d personally consider all the people I’ve listed to have demonstrated at least a fairly good willingness and ability to reason seriously about the future, though there’s perhaps room for reasonable disagreement here. (With the caveat that I don’t know Pamlin and don’t know precisely who was in the FHI survey.)
[tl;dr This is an understandable thing to do, but does seem to result in biasing one’s sample towards higher x-risk estimates]
I can see the appeal of that principle. I partly apply such a principle myself (though in the form of giving less weight to some opinions, not ruling them out).
But what if it turns out the future won’t be weird in the ways you’re thinking of? Or what if it turns out that, even if it will be weird in those ways, influencing it is too hard, or just isn’t very urgent (i.e., the “hinge of history” is far from now), or is already too likely to turn out well “by default” (perhaps because future actors will also have mostly good intentions and will be more informed).
Under such conditions, it might be that the smartest people with the best judgement won’t demonstrate that they can reason seriously about weird futures, even if they hypothetically could, because it’s just not worth their time to do so. In the same way as how I haven’t demonstrated my ability to reason seriously about tax policy, because I think reasoning seriously about the long-term future is a better use of my time. Someone who starts off believing tax policy is an overwhelmingly big deal could then say “Well, Michael thinks the long-term future is what we should focus on instead, but how why should I trust Michael’s view on that when he hasn’t demonstrated he can reason seriously about the importance and consequences of tax policy?”
(I think I’m being inspired here by Trammell’s interested posting “But Have They Engaged With The Arguments?” There’s some LessWrong discussion—which I haven’t read—of an early version here.)
I in fact do believe we should focus on long-term impacts, and am dedicating my career to doing so, as influencing the long-term future seems sufficiently likely to be tractable, urgent, and important. But I think there are reasonable arguments against each of those claims, and I wouldn’t be very surprised if they turned out to all be wrong. (But I think currently we’ve only had a very small part of humanity working intensely and strategically on this topic for just ~15 years, so it would seem too early to assume there’s nothing we can usefully do here.)
And if so, it would be better to try to improve the short-term future, which further future people can’t help us with, and then it would make sense for the smart people with good judgement to not demonstrate their ability to think seriously about the long-term future. So under such conditions, the people left in the sample you pay attention to aren’t the smartest people with the best judgement, and are skewed towards unreasonably high estimates of the tractability, urgency, and/or importance of influencing the long-term future.
To emphasise: I really do want way more work on existential risks and longtermism more broadly! And I do think that, when it comes to those topics, we should pay more attention to “experts” who’ve thought a lot about those topics than to other people (even if we shouldn’t only pay attention to them). I just want us to be careful about things like echo chamber effects and biasing the sample of opinions we listen to.