You could imagine a situation where for some reason the US and China are like, “Whoever gets to AGI first just wins the universe.” And I think in that scenario maybe I’m a bit worried, but even then, it seems like extinction is just worse, and as a result, you get significantly less risky behavior? But I don’t think you get to the point where people are just literally racing ahead with no thought to safety for the sake of winning.
My interpretation of what Rohin is saying there is:
1) Extinction is an extremely bad outcome.
2) It’s much worse than ‘losing’ an international competition to ‘win the universe’.
3) Countries/institutions/people will therefore be significantly inclined to avoid risking extinction, even if doing so would increase the chances of ‘winning’ an international competition to ‘win the universe’.
I agree with claim 1.
I agree with some form of claim 3, in that:
I think the badness of extinction will reduce the risks people are willing to take
I also “don’t think you get to the point where people are just literally racing ahead with no thought to safety for the sake of winning.”
But I don’t think the risks will be reduced anywhere near as much as they should be. (That said, I also believe that odds are in favour of things “going well by default”, just not as much in favour of that as I’d like).
This is related to my sense that claim 2 is somewhat tricky/ambiguous. Are we talking about whether it is worse, or whether the relevant actors will perceive it as worse? One common argument for why existential risks are neglected is that it’s basically a standard market failure. The vast majority of the harm from x-risks are externalities, and x-risk reduction is a global public good. Even if we consider deaths/suffering in the present generation, even China and India absorb less than half of that “cost”, and most countries absorb less than 1% of them. And I believe most people focused on x-risk reduction are at least broadly longtermist, so they’d perceived the overwhelming majority of the costs to be to future generations, and thus also externalities.
So it seems like, unless we expect the relevant actors to act in accordance with something close to impartial altruism, we should expect them to avoid risks somewhat to avoid existential risks (or extinction specifically), but far less than they really should. (Roughly this argument is made in The Precipice, and I believe by 80k.)
(Rohin also discusses right after that quote why he doesn’t “think that differences in who gets to AGI first are going to lead to you win the universe or not”, which I do think somewhat bolsters the case for claim 2.)
So it seems like, unless we expect the relevant actors to act in accordance with something close to impartial altruism, we should expect them to avoid risks somewhat to avoid existential risks (or extinction specifically), but far less than they really should. (Roughly this argument is made in The Precipice, and I believe by 80k.)
I agree that actors will focus on x-risk far less than they “should”—that’s exactly why I work on AI alignment! This doesn’t mean that x-risk is high in an absolute sense, just higher than it “should” be from an altruistic perspective. Presumably from an altruistic perspective x-risk should be very low (certainly below 1%), so my 10% estimate is orders of magnitude higher than what it “should” be.
Also, re: Precipice, it’s worth noting that Toby and I don’t disagree much—I estimate 1 in 10 conditioned on no action from longtermists; he estimates 1 in 5 conditioned on AGI being developed this century. Let’s say that action from longtermists can halve the risk; then my unconditional estimate would be 1 in 20, and would be very slightly higher if we condition on AGI being developed this century (because we’d have less time to prepare), so overall there’s a 4x difference, which given the huge uncertainty is really not very much.
Perhaps I should’ve been clear that I didn’t expect what I was saying was things you hadn’t heard. (I mean, I think I watched an EAG video of you presenting on 80k’s ideas, and you were in The Precipice’s acknowledgements.)
I guess I was just suggesting that your comments there, taken by themselves/out of context, seemed to ignore those important arguments, and thus might seem overly optimistic. Which seemed mildly potentially important for someone to mention at some point, as I’ve seen this cited as an example of AI researcher optimism. (Though of course I acknowledge your comments were off the cuff and not initially intended for public consumption, and any such interview will likely contain moments that are imperfectly phrased or open to misinterpretation.)
Also, re: Precipice, it’s worth noting that Toby and I don’t disagree much—I estimate 1 in 10 conditioned on no action from longtermists; he estimates 1 in 5 conditioned on AGI being developed this century. Let’s say that action from longtermists can halve the risk; then my unconditional estimate would be 1 in 20[...] (emphasis added)
I find this quite interesting. Is this for existential risk from AI as a whole, or just “adversarial optimisation”/”misalignment” type scenarios? E.g., does it also include things like misuse and “structural risks” (e.g., AI increasing risks of nuclear war by forcing people to make decisions faster)?
I’m not saying it’d be surprisingly low if it does include those things. I’m just wondering, as estimates like this are few and far between, so now that I’ve stumbled upon one I want to understand its scope and add it to my outside view.
Also, I bolded conditioned and unconditional, because that seems to me to suggest that you also currently expect the level of longtermist intervention that would reduce the risk to 1 in 20 to happen. Like, for you, “there’s no action from longtermists” would be a specific constraint you have to add to your world model? That also makes sense; I just feel like I’ve usually not seen things presented that way.
I imagine you could also condition on something like “surprisingly much action from longtermists”, which would reduce your estimated risk further?
I guess I was just suggesting that your comments there, taken by themselves/out of context, seemed to ignore those important arguments, and thus might seem overly optimistic.
Sure, that seems reasonable.
Is this for existential risk from AI as a whole, or just “adversarial optimisation”/”misalignment” type scenarios?
Just adversarial optimization / misalignment. See the comment thread with Wei Dai below, especially this comment.
Like, for you, “there’s no action from longtermists” would be a specific constraint you have to add to your world model?
Oh yeah, definitely. (Toby does the same in The Precipice; his position is that it’s clearer not to condition on anything, because it’s usually unclear what exactly you are conditioning on, though in person he did like the operationalization of “without action from longtermists”.)
Like, my model of the world is that for any sufficiently important decision like the development of powerful AI systems, there are lots of humans bringing many perspectives to the table, which usually ends up with most considerations being brought up by someone, and an overall high level of risk aversion. On this model, longtermists are one of the many groups that argue for being more careful than we otherwise would be.
I imagine you could also condition on something like “surprisingly much action from longtermists”, which would reduce your estimated risk further?
Yeah, presumably. The 1 in 20 number was very made up, even more so than the 1 in 10 number. I suppose if our actions were very successful, I could see us getting down to 1 in 1000? But if we just exerted a lot more effort (i.e. “surprisingly much action”), the extra effort probably doesn’t help much more than the initial effort, so maybe… 1 in 25? 1 in 30?
(All of this is very anchored on the initial 1 in 10 number.)
And yes, this does seem quite consistent with Ord’s framing. E.g., he writes “my estimates above incorporate the possibility that we get our act together and start taking these risks very seriously.” So I guess I’ve seen it presented this way at least that once, but I’m not sure I’ve seen it made explicit like that very often (and doing so seems useful and retrospectively-obvious).
But if we just exerted a lot more effort (i.e. “surprisingly much action”), the extra effort probably doesn’t help much more than the initial effort, so maybe… 1 in 25? 1 in 30?
Are you thinking roughly that (a) returns diminish steeply from the current point, or (b) that effort will likely ramp up a lot in future and pluck a large quantity of the low hanging fruit that currently remain, such that even more ramping up would face steeply diminishing returns?
That’s a vague question, and may not be very useful. The motivation for it is that I was surprised you saw the gap between business as usual and “surprisingly much action” as being as small as you did, and wonder roughly what portion of that is about you thinking additional people working on this won’t be very useful, vs thinking very super useful additional people will eventually jump aboard “by default”.
Are you thinking roughly that (a) returns diminish steeply from the current point, or (b) that effort will likely ramp up a lot in future and pluck a large quantity of the low hanging fruit that currently remain, such that even more ramping up would face steeply diminishing returns?
More like (b) than (a). In particular, I’m think of lots of additional effort by longtermists, which probably doesn’t result in lots of additional effort by everyone else, which already means that we’re scaling sublinearly. In addition, you should then expect diminishing marginal returns to more research, which lessens it even more.Also, a thing that I realized
Also, I was thinking about this recently, and I am pretty pessimistic about worlds with discontinuous takeoff, which should maybe add another ~5 percentage points to my risk estimate conditional on no intervention by longtermists, and ~4 percentage points to my unconditional risk estimate.
So you’ve updated your unconditional estimate from ~5% (1 in 20) to ~9%? If so, people may have to stop citing you as an “optimist”… (which was already perhaps a tad misleading, given what the 1 in 20 was about)
(I mean, I know we’re all sort-of just playing with incredibly uncertain numbers about fuzzy scenarios anyway, but still.)
If so, people may have to stop citing you as an “optimist”
I wouldn’t be surprised if the median number from MIRI researchers was around 50%. I think the people who cite me as an optimist are people with those background beliefs. I think even at 5% I’d fall on the pessimistic side at FHI (though certainly not the most pessimistic, e.g. Toby is more pessimistic than I am.
’Actually, the people Tim is talking about here are often more pessimistic about societal outcomes than Tim is suggesting. Many of them are, roughly speaking, 65%-85% confident that machine superintelligence will lead to human extinction, and that it’s only in a small minority of possible worlds that humanity rises to the challenge and gets a machine superintelligence robustly aligned with humane values.’ — Luke Muehlhauser, https://lukemuehlhauser.com/a-reply-to-wait-but-why-on-machine-superintelligence/
’In terms of falsifiability, if you have an AGI that passes the real no-holds-barred Turing Test over all human capabilities that can be tested in a one-hour conversation, and life as we know it is still continuing 2 years later, I’m pretty shocked. In fact, I’m pretty shocked if you get up to that point at all before the end of the world.’ — Eliezer Yudkowsky, https://www.econlib.org/archives/2016/03/so_far_my_respo.html
My interpretation of what Rohin is saying there is:
1) Extinction is an extremely bad outcome.
2) It’s much worse than ‘losing’ an international competition to ‘win the universe’.
3) Countries/institutions/people will therefore be significantly inclined to avoid risking extinction, even if doing so would increase the chances of ‘winning’ an international competition to ‘win the universe’.
I agree with claim 1.
I agree with some form of claim 3, in that:
I think the badness of extinction will reduce the risks people are willing to take
I also “don’t think you get to the point where people are just literally racing ahead with no thought to safety for the sake of winning.”
But I don’t think the risks will be reduced anywhere near as much as they should be. (That said, I also believe that odds are in favour of things “going well by default”, just not as much in favour of that as I’d like).
This is related to my sense that claim 2 is somewhat tricky/ambiguous. Are we talking about whether it is worse, or whether the relevant actors will perceive it as worse? One common argument for why existential risks are neglected is that it’s basically a standard market failure. The vast majority of the harm from x-risks are externalities, and x-risk reduction is a global public good. Even if we consider deaths/suffering in the present generation, even China and India absorb less than half of that “cost”, and most countries absorb less than 1% of them. And I believe most people focused on x-risk reduction are at least broadly longtermist, so they’d perceived the overwhelming majority of the costs to be to future generations, and thus also externalities.
So it seems like, unless we expect the relevant actors to act in accordance with something close to impartial altruism, we should expect them to avoid risks somewhat to avoid existential risks (or extinction specifically), but far less than they really should. (Roughly this argument is made in The Precipice, and I believe by 80k.)
(Rohin also discusses right after that quote why he doesn’t “think that differences in who gets to AGI first are going to lead to you win the universe or not”, which I do think somewhat bolsters the case for claim 2.)
I agree that actors will focus on x-risk far less than they “should”—that’s exactly why I work on AI alignment! This doesn’t mean that x-risk is high in an absolute sense, just higher than it “should” be from an altruistic perspective. Presumably from an altruistic perspective x-risk should be very low (certainly below 1%), so my 10% estimate is orders of magnitude higher than what it “should” be.
Also, re: Precipice, it’s worth noting that Toby and I don’t disagree much—I estimate 1 in 10 conditioned on no action from longtermists; he estimates 1 in 5 conditioned on AGI being developed this century. Let’s say that action from longtermists can halve the risk; then my unconditional estimate would be 1 in 20, and would be very slightly higher if we condition on AGI being developed this century (because we’d have less time to prepare), so overall there’s a 4x difference, which given the huge uncertainty is really not very much.
Thanks for this reply!
Perhaps I should’ve been clear that I didn’t expect what I was saying was things you hadn’t heard. (I mean, I think I watched an EAG video of you presenting on 80k’s ideas, and you were in The Precipice’s acknowledgements.)
I guess I was just suggesting that your comments there, taken by themselves/out of context, seemed to ignore those important arguments, and thus might seem overly optimistic. Which seemed mildly potentially important for someone to mention at some point, as I’ve seen this cited as an example of AI researcher optimism. (Though of course I acknowledge your comments were off the cuff and not initially intended for public consumption, and any such interview will likely contain moments that are imperfectly phrased or open to misinterpretation.)
I find this quite interesting. Is this for existential risk from AI as a whole, or just “adversarial optimisation”/”misalignment” type scenarios? E.g., does it also include things like misuse and “structural risks” (e.g., AI increasing risks of nuclear war by forcing people to make decisions faster)?
I’m not saying it’d be surprisingly low if it does include those things. I’m just wondering, as estimates like this are few and far between, so now that I’ve stumbled upon one I want to understand its scope and add it to my outside view.
Also, I bolded conditioned and unconditional, because that seems to me to suggest that you also currently expect the level of longtermist intervention that would reduce the risk to 1 in 20 to happen. Like, for you, “there’s no action from longtermists” would be a specific constraint you have to add to your world model? That also makes sense; I just feel like I’ve usually not seen things presented that way.
I imagine you could also condition on something like “surprisingly much action from longtermists”, which would reduce your estimated risk further?
Sure, that seems reasonable.
Just adversarial optimization / misalignment. See the comment thread with Wei Dai below, especially this comment.
Oh yeah, definitely. (Toby does the same in The Precipice; his position is that it’s clearer not to condition on anything, because it’s usually unclear what exactly you are conditioning on, though in person he did like the operationalization of “without action from longtermists”.)
Like, my model of the world is that for any sufficiently important decision like the development of powerful AI systems, there are lots of humans bringing many perspectives to the table, which usually ends up with most considerations being brought up by someone, and an overall high level of risk aversion. On this model, longtermists are one of the many groups that argue for being more careful than we otherwise would be.
Yeah, presumably. The 1 in 20 number was very made up, even more so than the 1 in 10 number. I suppose if our actions were very successful, I could see us getting down to 1 in 1000? But if we just exerted a lot more effort (i.e. “surprisingly much action”), the extra effort probably doesn’t help much more than the initial effort, so maybe… 1 in 25? 1 in 30?
(All of this is very anchored on the initial 1 in 10 number.)
Quite interesting. Thanks for that response.
And yes, this does seem quite consistent with Ord’s framing. E.g., he writes “my estimates above incorporate the possibility that we get our act together and start taking these risks very seriously.” So I guess I’ve seen it presented this way at least that once, but I’m not sure I’ve seen it made explicit like that very often (and doing so seems useful and retrospectively-obvious).
Are you thinking roughly that (a) returns diminish steeply from the current point, or (b) that effort will likely ramp up a lot in future and pluck a large quantity of the low hanging fruit that currently remain, such that even more ramping up would face steeply diminishing returns?
That’s a vague question, and may not be very useful. The motivation for it is that I was surprised you saw the gap between business as usual and “surprisingly much action” as being as small as you did, and wonder roughly what portion of that is about you thinking additional people working on this won’t be very useful, vs thinking very super useful additional people will eventually jump aboard “by default”.
More like (b) than (a). In particular, I’m think of lots of additional effort by longtermists, which probably doesn’t result in lots of additional effort by everyone else, which already means that we’re scaling sublinearly. In addition, you should then expect diminishing marginal returns to more research, which lessens it even more.Also, a thing that I realized
Also, I was thinking about this recently, and I am pretty pessimistic about worlds with discontinuous takeoff, which should maybe add another ~5 percentage points to my risk estimate conditional on no intervention by longtermists, and ~4 percentage points to my unconditional risk estimate.
Interesting (again!).
So you’ve updated your unconditional estimate from ~5% (1 in 20) to ~9%? If so, people may have to stop citing you as an “optimist”… (which was already perhaps a tad misleading, given what the 1 in 20 was about)
(I mean, I know we’re all sort-of just playing with incredibly uncertain numbers about fuzzy scenarios anyway, but still.)
I wouldn’t be surprised if the median number from MIRI researchers was around 50%. I think the people who cite me as an optimist are people with those background beliefs. I think even at 5% I’d fall on the pessimistic side at FHI (though certainly not the most pessimistic, e.g. Toby is more pessimistic than I am.
It may be useful.
’Actually, the people Tim is talking about here are often more pessimistic about societal outcomes than Tim is suggesting. Many of them are, roughly speaking, 65%-85% confident that machine superintelligence will lead to human extinction, and that it’s only in a small minority of possible worlds that humanity rises to the challenge and gets a machine superintelligence robustly aligned with humane values.’ — Luke Muehlhauser, https://lukemuehlhauser.com/a-reply-to-wait-but-why-on-machine-superintelligence/
’In terms of falsifiability, if you have an AGI that passes the real no-holds-barred Turing Test over all human capabilities that can be tested in a one-hour conversation, and life as we know it is still continuing 2 years later, I’m pretty shocked. In fact, I’m pretty shocked if you get up to that point at all before the end of the world.’ — Eliezer Yudkowsky, https://www.econlib.org/archives/2016/03/so_far_my_respo.html