So it seems like, unless we expect the relevant actors to act in accordance with something close to impartial altruism, we should expect them to avoid risks somewhat to avoid existential risks (or extinction specifically), but far less than they really should. (Roughly this argument is made in The Precipice, and I believe by 80k.)
I agree that actors will focus on x-risk far less than they “should”—that’s exactly why I work on AI alignment! This doesn’t mean that x-risk is high in an absolute sense, just higher than it “should” be from an altruistic perspective. Presumably from an altruistic perspective x-risk should be very low (certainly below 1%), so my 10% estimate is orders of magnitude higher than what it “should” be.
Also, re: Precipice, it’s worth noting that Toby and I don’t disagree much—I estimate 1 in 10 conditioned on no action from longtermists; he estimates 1 in 5 conditioned on AGI being developed this century. Let’s say that action from longtermists can halve the risk; then my unconditional estimate would be 1 in 20, and would be very slightly higher if we condition on AGI being developed this century (because we’d have less time to prepare), so overall there’s a 4x difference, which given the huge uncertainty is really not very much.
Perhaps I should’ve been clear that I didn’t expect what I was saying was things you hadn’t heard. (I mean, I think I watched an EAG video of you presenting on 80k’s ideas, and you were in The Precipice’s acknowledgements.)
I guess I was just suggesting that your comments there, taken by themselves/out of context, seemed to ignore those important arguments, and thus might seem overly optimistic. Which seemed mildly potentially important for someone to mention at some point, as I’ve seen this cited as an example of AI researcher optimism. (Though of course I acknowledge your comments were off the cuff and not initially intended for public consumption, and any such interview will likely contain moments that are imperfectly phrased or open to misinterpretation.)
Also, re: Precipice, it’s worth noting that Toby and I don’t disagree much—I estimate 1 in 10 conditioned on no action from longtermists; he estimates 1 in 5 conditioned on AGI being developed this century. Let’s say that action from longtermists can halve the risk; then my unconditional estimate would be 1 in 20[...] (emphasis added)
I find this quite interesting. Is this for existential risk from AI as a whole, or just “adversarial optimisation”/”misalignment” type scenarios? E.g., does it also include things like misuse and “structural risks” (e.g., AI increasing risks of nuclear war by forcing people to make decisions faster)?
I’m not saying it’d be surprisingly low if it does include those things. I’m just wondering, as estimates like this are few and far between, so now that I’ve stumbled upon one I want to understand its scope and add it to my outside view.
Also, I bolded conditioned and unconditional, because that seems to me to suggest that you also currently expect the level of longtermist intervention that would reduce the risk to 1 in 20 to happen. Like, for you, “there’s no action from longtermists” would be a specific constraint you have to add to your world model? That also makes sense; I just feel like I’ve usually not seen things presented that way.
I imagine you could also condition on something like “surprisingly much action from longtermists”, which would reduce your estimated risk further?
I guess I was just suggesting that your comments there, taken by themselves/out of context, seemed to ignore those important arguments, and thus might seem overly optimistic.
Sure, that seems reasonable.
Is this for existential risk from AI as a whole, or just “adversarial optimisation”/”misalignment” type scenarios?
Just adversarial optimization / misalignment. See the comment thread with Wei Dai below, especially this comment.
Like, for you, “there’s no action from longtermists” would be a specific constraint you have to add to your world model?
Oh yeah, definitely. (Toby does the same in The Precipice; his position is that it’s clearer not to condition on anything, because it’s usually unclear what exactly you are conditioning on, though in person he did like the operationalization of “without action from longtermists”.)
Like, my model of the world is that for any sufficiently important decision like the development of powerful AI systems, there are lots of humans bringing many perspectives to the table, which usually ends up with most considerations being brought up by someone, and an overall high level of risk aversion. On this model, longtermists are one of the many groups that argue for being more careful than we otherwise would be.
I imagine you could also condition on something like “surprisingly much action from longtermists”, which would reduce your estimated risk further?
Yeah, presumably. The 1 in 20 number was very made up, even more so than the 1 in 10 number. I suppose if our actions were very successful, I could see us getting down to 1 in 1000? But if we just exerted a lot more effort (i.e. “surprisingly much action”), the extra effort probably doesn’t help much more than the initial effort, so maybe… 1 in 25? 1 in 30?
(All of this is very anchored on the initial 1 in 10 number.)
And yes, this does seem quite consistent with Ord’s framing. E.g., he writes “my estimates above incorporate the possibility that we get our act together and start taking these risks very seriously.” So I guess I’ve seen it presented this way at least that once, but I’m not sure I’ve seen it made explicit like that very often (and doing so seems useful and retrospectively-obvious).
But if we just exerted a lot more effort (i.e. “surprisingly much action”), the extra effort probably doesn’t help much more than the initial effort, so maybe… 1 in 25? 1 in 30?
Are you thinking roughly that (a) returns diminish steeply from the current point, or (b) that effort will likely ramp up a lot in future and pluck a large quantity of the low hanging fruit that currently remain, such that even more ramping up would face steeply diminishing returns?
That’s a vague question, and may not be very useful. The motivation for it is that I was surprised you saw the gap between business as usual and “surprisingly much action” as being as small as you did, and wonder roughly what portion of that is about you thinking additional people working on this won’t be very useful, vs thinking very super useful additional people will eventually jump aboard “by default”.
Are you thinking roughly that (a) returns diminish steeply from the current point, or (b) that effort will likely ramp up a lot in future and pluck a large quantity of the low hanging fruit that currently remain, such that even more ramping up would face steeply diminishing returns?
More like (b) than (a). In particular, I’m think of lots of additional effort by longtermists, which probably doesn’t result in lots of additional effort by everyone else, which already means that we’re scaling sublinearly. In addition, you should then expect diminishing marginal returns to more research, which lessens it even more.Also, a thing that I realized
Also, I was thinking about this recently, and I am pretty pessimistic about worlds with discontinuous takeoff, which should maybe add another ~5 percentage points to my risk estimate conditional on no intervention by longtermists, and ~4 percentage points to my unconditional risk estimate.
So you’ve updated your unconditional estimate from ~5% (1 in 20) to ~9%? If so, people may have to stop citing you as an “optimist”… (which was already perhaps a tad misleading, given what the 1 in 20 was about)
(I mean, I know we’re all sort-of just playing with incredibly uncertain numbers about fuzzy scenarios anyway, but still.)
If so, people may have to stop citing you as an “optimist”
I wouldn’t be surprised if the median number from MIRI researchers was around 50%. I think the people who cite me as an optimist are people with those background beliefs. I think even at 5% I’d fall on the pessimistic side at FHI (though certainly not the most pessimistic, e.g. Toby is more pessimistic than I am.
’Actually, the people Tim is talking about here are often more pessimistic about societal outcomes than Tim is suggesting. Many of them are, roughly speaking, 65%-85% confident that machine superintelligence will lead to human extinction, and that it’s only in a small minority of possible worlds that humanity rises to the challenge and gets a machine superintelligence robustly aligned with humane values.’ — Luke Muehlhauser, https://lukemuehlhauser.com/a-reply-to-wait-but-why-on-machine-superintelligence/
’In terms of falsifiability, if you have an AGI that passes the real no-holds-barred Turing Test over all human capabilities that can be tested in a one-hour conversation, and life as we know it is still continuing 2 years later, I’m pretty shocked. In fact, I’m pretty shocked if you get up to that point at all before the end of the world.’ — Eliezer Yudkowsky, https://www.econlib.org/archives/2016/03/so_far_my_respo.html
I agree that actors will focus on x-risk far less than they “should”—that’s exactly why I work on AI alignment! This doesn’t mean that x-risk is high in an absolute sense, just higher than it “should” be from an altruistic perspective. Presumably from an altruistic perspective x-risk should be very low (certainly below 1%), so my 10% estimate is orders of magnitude higher than what it “should” be.
Also, re: Precipice, it’s worth noting that Toby and I don’t disagree much—I estimate 1 in 10 conditioned on no action from longtermists; he estimates 1 in 5 conditioned on AGI being developed this century. Let’s say that action from longtermists can halve the risk; then my unconditional estimate would be 1 in 20, and would be very slightly higher if we condition on AGI being developed this century (because we’d have less time to prepare), so overall there’s a 4x difference, which given the huge uncertainty is really not very much.
Thanks for this reply!
Perhaps I should’ve been clear that I didn’t expect what I was saying was things you hadn’t heard. (I mean, I think I watched an EAG video of you presenting on 80k’s ideas, and you were in The Precipice’s acknowledgements.)
I guess I was just suggesting that your comments there, taken by themselves/out of context, seemed to ignore those important arguments, and thus might seem overly optimistic. Which seemed mildly potentially important for someone to mention at some point, as I’ve seen this cited as an example of AI researcher optimism. (Though of course I acknowledge your comments were off the cuff and not initially intended for public consumption, and any such interview will likely contain moments that are imperfectly phrased or open to misinterpretation.)
I find this quite interesting. Is this for existential risk from AI as a whole, or just “adversarial optimisation”/”misalignment” type scenarios? E.g., does it also include things like misuse and “structural risks” (e.g., AI increasing risks of nuclear war by forcing people to make decisions faster)?
I’m not saying it’d be surprisingly low if it does include those things. I’m just wondering, as estimates like this are few and far between, so now that I’ve stumbled upon one I want to understand its scope and add it to my outside view.
Also, I bolded conditioned and unconditional, because that seems to me to suggest that you also currently expect the level of longtermist intervention that would reduce the risk to 1 in 20 to happen. Like, for you, “there’s no action from longtermists” would be a specific constraint you have to add to your world model? That also makes sense; I just feel like I’ve usually not seen things presented that way.
I imagine you could also condition on something like “surprisingly much action from longtermists”, which would reduce your estimated risk further?
Sure, that seems reasonable.
Just adversarial optimization / misalignment. See the comment thread with Wei Dai below, especially this comment.
Oh yeah, definitely. (Toby does the same in The Precipice; his position is that it’s clearer not to condition on anything, because it’s usually unclear what exactly you are conditioning on, though in person he did like the operationalization of “without action from longtermists”.)
Like, my model of the world is that for any sufficiently important decision like the development of powerful AI systems, there are lots of humans bringing many perspectives to the table, which usually ends up with most considerations being brought up by someone, and an overall high level of risk aversion. On this model, longtermists are one of the many groups that argue for being more careful than we otherwise would be.
Yeah, presumably. The 1 in 20 number was very made up, even more so than the 1 in 10 number. I suppose if our actions were very successful, I could see us getting down to 1 in 1000? But if we just exerted a lot more effort (i.e. “surprisingly much action”), the extra effort probably doesn’t help much more than the initial effort, so maybe… 1 in 25? 1 in 30?
(All of this is very anchored on the initial 1 in 10 number.)
Quite interesting. Thanks for that response.
And yes, this does seem quite consistent with Ord’s framing. E.g., he writes “my estimates above incorporate the possibility that we get our act together and start taking these risks very seriously.” So I guess I’ve seen it presented this way at least that once, but I’m not sure I’ve seen it made explicit like that very often (and doing so seems useful and retrospectively-obvious).
Are you thinking roughly that (a) returns diminish steeply from the current point, or (b) that effort will likely ramp up a lot in future and pluck a large quantity of the low hanging fruit that currently remain, such that even more ramping up would face steeply diminishing returns?
That’s a vague question, and may not be very useful. The motivation for it is that I was surprised you saw the gap between business as usual and “surprisingly much action” as being as small as you did, and wonder roughly what portion of that is about you thinking additional people working on this won’t be very useful, vs thinking very super useful additional people will eventually jump aboard “by default”.
More like (b) than (a). In particular, I’m think of lots of additional effort by longtermists, which probably doesn’t result in lots of additional effort by everyone else, which already means that we’re scaling sublinearly. In addition, you should then expect diminishing marginal returns to more research, which lessens it even more.Also, a thing that I realized
Also, I was thinking about this recently, and I am pretty pessimistic about worlds with discontinuous takeoff, which should maybe add another ~5 percentage points to my risk estimate conditional on no intervention by longtermists, and ~4 percentage points to my unconditional risk estimate.
Interesting (again!).
So you’ve updated your unconditional estimate from ~5% (1 in 20) to ~9%? If so, people may have to stop citing you as an “optimist”… (which was already perhaps a tad misleading, given what the 1 in 20 was about)
(I mean, I know we’re all sort-of just playing with incredibly uncertain numbers about fuzzy scenarios anyway, but still.)
I wouldn’t be surprised if the median number from MIRI researchers was around 50%. I think the people who cite me as an optimist are people with those background beliefs. I think even at 5% I’d fall on the pessimistic side at FHI (though certainly not the most pessimistic, e.g. Toby is more pessimistic than I am.
It may be useful.
’Actually, the people Tim is talking about here are often more pessimistic about societal outcomes than Tim is suggesting. Many of them are, roughly speaking, 65%-85% confident that machine superintelligence will lead to human extinction, and that it’s only in a small minority of possible worlds that humanity rises to the challenge and gets a machine superintelligence robustly aligned with humane values.’ — Luke Muehlhauser, https://lukemuehlhauser.com/a-reply-to-wait-but-why-on-machine-superintelligence/
’In terms of falsifiability, if you have an AGI that passes the real no-holds-barred Turing Test over all human capabilities that can be tested in a one-hour conversation, and life as we know it is still continuing 2 years later, I’m pretty shocked. In fact, I’m pretty shocked if you get up to that point at all before the end of the world.’ — Eliezer Yudkowsky, https://www.econlib.org/archives/2016/03/so_far_my_respo.html