Leopold Aschenbrenners series of essays is a fascinating read: there is a ton of locally valid observations and arguments. Lot of the content is the type of stuff mostly discussed in private. Many of the high-level observations are correct.
At the same time, my overall impression is the set of maps sketched pulls toward existential catastrophe, and this is true not only for the ‘this is how things can go wrong’ part, but also for the ‘this is how we solve things’ part. Leopold is likely aware of the this angle of criticism, and deflects it with ‘this is just realism’ and ‘I don’t wish things were like this, but they most likely are’. I basically don’t buy that claim.
I don’t agree that it “pulls towards existential catastrophe”. Pulls towards catastrophe, certainly, but not existential catastrophe? He’s explicitly not a doomer,[1] and is much more focused on really-bad-but-survivable harms like WW3, authoritarian takeover, and societal upheaval.
Page 105 of the PDF, “I am not a doomer.”, with a footnote where he links a Yudkowsky tweet agreeing that he’s not a doomer. Also, he listed his p(doom) as 5% last year. I didn’t see an updated p(doom) in Situational Awareness or his Dwarkesh interview, though I might have missed it.
The question of ‘pulls towards catastrophe’ doesn’t matter whether the author believes their work pulls towards catastrophe. The direction of the pull is in the eye of the reader. Therefore, you must evaluate whether Jan (or you, or I) believe that the futures which Leopold’s maps pull us toward will result in existential catastrophes. For a simplified explanation, imagine that Leopold is driving fast at night on a winding cliffside road, and his vision is obscured by a heads-up display of a map of his own devising. If his map directs him to take a left and he drives over the cliff edge… It doesn’t matter where Leopold thought he would end up, it matters where he got to. If you are his passenger, you should care more about where you think he’s navigation is likely to actually end you up at than about where Leopold believes that his navigation will end up.
I think this gets more tricky because of coordination. Leopold’s main effect is in selling maps, not using them. If his maps list a town in a particular location, which consumers and producers both travel to expecting a town, then his map has reshaped the territory and caused a town to exist.
Pointing out one concrete dynamic here, most of his argument boils down to “we must avoid a disastrous AI arms race by racing faster than our enemies to ASI”, but of course it is unclear whether an “AI arms race” would even exist if nobody were talking about an “AI arms race”. That is, just following incentives and coordinating rationally with their competitors.
There’s also obviously the classic “AGI will likely end the world, thus I should invest in / work on it since if it doesn’t I’ll be rich, therefore AGI is more likely to end the world” self-fulfilling prophesy that has been a scourge on our field since the founding of DeepMind.
Hm, I was interpreting ‘pulls towards existential catastrophe’ as meaning Leopold’s map mismatches the territory because it overrates the chance of existential catastrophe.
If the argument is instead “Leopold publishing his map increases the chance of existential catastrophe” (by charging race dynamics, for example) then I agree that’s plausible. (Though I don’t think the choice to publish it was inexcusable—the effects are hard to predict, and there’s much to be said for trying to say true things.)
If the argument is “following Leopold’s plan likely leads to existential catastrophe”, same opinion as above.
Oh huh, I hadn’t even considered that interpretation. Personally, I think Leopold’s key error is in underrating how soon we will get to AGI if we continue as we have been, and in not thinking that that is as dangerous an achievement as I think it is.
So, if your interpretation of ‘overrates chance of existential catastrophe’ is correct, I am of the opposite opinion. Seems like Leopold expects we can make good use of AGI without a bunch more alignment. I think we’ll just doom ourselves if we try to use it.
Yes, and my modal time-to-AGI is late 2025 / early 2026. I think we’re right on the brink of a pre-AGI recursive self-improvement loop which will quickly rocket us past AGI. I think we are already in a significant compute overhang and data overhang. In other words, that software improvements alone can be more than sufficient.
In other words, I am concerned.
The difference between these two estimates feels like it can be pretty well accounted for by reasonable expected development friction for prototype-humanish-level self-improvers, who will still be subject to many (minus some) of the same limitations that prevent “9 woman from growing a baby in a month”. You can predict they’ll be able to lubricate more or less of that, but we can’t currently strictly scale project speeds by throwing masses of software engineers and money at it.
I believe you are correct about the importance of taking these phenomena into account: indivisibility of certain serial tasks, coordination overhead of larger team sizes.
I do think that my model takes these into account.
It’s certainly possible that my model is wrong. I feel like there’s a lot of uncertainty in many key variables, and likely I have overlooked things. The phenomena you point out don’t happen to be things that I neglected to consider though.
I understand—my point is more that the difference between these two positions could be readily explained by you being slightly more optimistic in estimated task time when doing the accounting, and the voice of experience saying “take your best estimate of the task time, and double it, and that’s what it actually is”.
One example: Leopold spends a lot of time talking about how we need to beat China to AGI and even talks about how we will need to build robo armies. He paints it as liberal democracy against the CCP. Seems that he would basically burn timeline and accelerate to beat China. At the same time, he doesn’t really talk about his plan for alignment which kind of shows his priorities. I think his narrative shifts the focus from the real problem (alignment).
This part shows some of his thinking. Dwarkesh makes some good counter points here, like how is Donald Trump having this power better than Xi.
(crossposted from twitter) Main thoughts:
1. Maps pull the territory
2. Beware what maps you summon
Leopold Aschenbrenners series of essays is a fascinating read: there is a ton of locally valid observations and arguments. Lot of the content is the type of stuff mostly discussed in private. Many of the high-level observations are correct.
At the same time, my overall impression is the set of maps sketched pulls toward existential catastrophe, and this is true not only for the ‘this is how things can go wrong’ part, but also for the ‘this is how we solve things’ part. Leopold is likely aware of the this angle of criticism, and deflects it with ‘this is just realism’ and ‘I don’t wish things were like this, but they most likely are’. I basically don’t buy that claim.
He’s starting an AGI investment firm that invests based on his thesis, so he does have a direct financial incentive to make this scenario more likely
(Though he also has an incentive to not die.)
I agree that it’s a good read.
I don’t agree that it “pulls towards existential catastrophe”. Pulls towards catastrophe, certainly, but not existential catastrophe? He’s explicitly not a doomer,[1] and is much more focused on really-bad-but-survivable harms like WW3, authoritarian takeover, and societal upheaval.
Page 105 of the PDF, “I am not a doomer.”, with a footnote where he links a Yudkowsky tweet agreeing that he’s not a doomer. Also, he listed his p(doom) as 5% last year. I didn’t see an updated p(doom) in Situational Awareness or his Dwarkesh interview, though I might have missed it.
The question of ‘pulls towards catastrophe’ doesn’t matter whether the author believes their work pulls towards catastrophe. The direction of the pull is in the eye of the reader. Therefore, you must evaluate whether Jan (or you, or I) believe that the futures which Leopold’s maps pull us toward will result in existential catastrophes. For a simplified explanation, imagine that Leopold is driving fast at night on a winding cliffside road, and his vision is obscured by a heads-up display of a map of his own devising. If his map directs him to take a left and he drives over the cliff edge… It doesn’t matter where Leopold thought he would end up, it matters where he got to. If you are his passenger, you should care more about where you think he’s navigation is likely to actually end you up at than about where Leopold believes that his navigation will end up.
I think this gets more tricky because of coordination. Leopold’s main effect is in selling maps, not using them. If his maps list a town in a particular location, which consumers and producers both travel to expecting a town, then his map has reshaped the territory and caused a town to exist.
Pointing out one concrete dynamic here, most of his argument boils down to “we must avoid a disastrous AI arms race by racing faster than our enemies to ASI”, but of course it is unclear whether an “AI arms race” would even exist if nobody were talking about an “AI arms race”. That is, just following incentives and coordinating rationally with their competitors.
There’s also obviously the classic “AGI will likely end the world, thus I should invest in / work on it since if it doesn’t I’ll be rich, therefore AGI is more likely to end the world” self-fulfilling prophesy that has been a scourge on our field since the founding of DeepMind.
Hm, I was interpreting ‘pulls towards existential catastrophe’ as meaning Leopold’s map mismatches the territory because it overrates the chance of existential catastrophe.
If the argument is instead “Leopold publishing his map increases the chance of existential catastrophe” (by charging race dynamics, for example) then I agree that’s plausible. (Though I don’t think the choice to publish it was inexcusable—the effects are hard to predict, and there’s much to be said for trying to say true things.)
If the argument is “following Leopold’s plan likely leads to existential catastrophe”, same opinion as above.
Oh huh, I hadn’t even considered that interpretation. Personally, I think Leopold’s key error is in underrating how soon we will get to AGI if we continue as we have been, and in not thinking that that is as dangerous an achievement as I think it is.
So, if your interpretation of ‘overrates chance of existential catastrophe’ is correct, I am of the opposite opinion. Seems like Leopold expects we can make good use of AGI without a bunch more alignment. I think we’ll just doom ourselves if we try to use it.
His modal time-to-AGI is like 2027, with a 2-3 year intelligence explosion afterwards before humanity is ~ irrelevant.
Yeah this seems likely.
Yes, and my modal time-to-AGI is late 2025 / early 2026. I think we’re right on the brink of a pre-AGI recursive self-improvement loop which will quickly rocket us past AGI. I think we are already in a significant compute overhang and data overhang. In other words, that software improvements alone can be more than sufficient. In other words, I am concerned.
The difference between these two estimates feels like it can be pretty well accounted for by reasonable expected development friction for prototype-humanish-level self-improvers, who will still be subject to many (minus some) of the same limitations that prevent “9 woman from growing a baby in a month”. You can predict they’ll be able to lubricate more or less of that, but we can’t currently strictly scale project speeds by throwing masses of software engineers and money at it.
I believe you are correct about the importance of taking these phenomena into account: indivisibility of certain serial tasks, coordination overhead of larger team sizes. I do think that my model takes these into account.
It’s certainly possible that my model is wrong. I feel like there’s a lot of uncertainty in many key variables, and likely I have overlooked things. The phenomena you point out don’t happen to be things that I neglected to consider though.
I understand—my point is more that the difference between these two positions could be readily explained by you being slightly more optimistic in estimated task time when doing the accounting, and the voice of experience saying “take your best estimate of the task time, and double it, and that’s what it actually is”.
One example: Leopold spends a lot of time talking about how we need to beat China to AGI and even talks about how we will need to build robo armies. He paints it as liberal democracy against the CCP. Seems that he would basically burn timeline and accelerate to beat China. At the same time, he doesn’t really talk about his plan for alignment which kind of shows his priorities. I think his narrative shifts the focus from the real problem (alignment).
This part shows some of his thinking. Dwarkesh makes some good counter points here, like how is Donald Trump having this power better than Xi.