Jan_Kulveit comments on Former OpenAI Superalignment Researcher: Superintelligence by 2030

Jan_Kulveit 5 Jun 2024 9:17 UTC
91 points
69
(crossposted from twitter) Main thoughts:
1. Maps pull the territory
2. Beware what maps you summon

Leopold Aschenbrenners series of essays is a fascinating read: there is a ton of locally valid observations and arguments. Lot of the content is the type of stuff mostly discussed in private. Many of the high-level observations are correct.
At the same time, my overall impression is the set of maps sketched pulls toward existential catastrophe, and this is true not only for the ‘this is how things can go wrong’ part, but also for the ‘this is how we solve things’ part. Leopold is likely aware of the this angle of criticism, and deflects it with ‘this is just realism’ and ‘I don’t wish things were like this, but they most likely are’. I basically don’t buy that claim.
- No77e 5 Jun 2024 17:21 UTC
  19 points
  18
  Parent
  He’s starting an AGI investment firm that invests based on his thesis, so he does have a direct financial incentive to make this scenario more likely
  - Rob Bensinger 6 Jun 2024 23:35 UTC
    16 points
    11
    Parent
    (Though he also has an incentive to not die.)
- Julian Bradshaw 5 Jun 2024 17:22 UTC
  9 points
  −7
  Parent
  I agree that it’s a good read.
  I don’t agree that it “pulls towards existential catastrophe”. Pulls towards catastrophe, certainly, but not existential catastrophe? He’s explicitly not a doomer,^[1] and is much more focused on really-bad-but-survivable harms like WW3, authoritarian takeover, and societal upheaval.
  1. ^
    Page 105 of the PDF, “I am not a doomer.”, with a footnote where he links a Yudkowsky tweet agreeing that he’s not a doomer. Also, he listed his p(doom) as 5% last year. I didn’t see an updated p(doom) in Situational Awareness or his Dwarkesh interview, though I might have missed it.
  - Nathan Helm-Burger 6 Jun 2024 12:09 UTC
    16 points
    8
    Parent
    The question of ‘pulls towards catastrophe’ doesn’t matter whether the author believes their work pulls towards catastrophe. The direction of the pull is in the eye of the reader. Therefore, you must evaluate whether Jan (or you, or I) believe that the futures which Leopold’s maps pull us toward will result in existential catastrophes. For a simplified explanation, imagine that Leopold is driving fast at night on a winding cliffside road, and his vision is obscured by a heads-up display of a map of his own devising. If his map directs him to take a left and he drives over the cliff edge… It doesn’t matter where Leopold thought he would end up, it matters where he got to. If you are his passenger, you should care more about where you think he’s navigation is likely to actually end you up at than about where Leopold believes that his navigation will end up.
    - Garrett Baker 6 Jun 2024 18:23 UTC
      8 points
      7
      Parent
      I think this gets more tricky because of coordination. Leopold’s main effect is in selling maps, not using them. If his maps list a town in a particular location, which consumers and producers both travel to expecting a town, then his map has reshaped the territory and caused a town to exist.
      
      Pointing out one concrete dynamic here, most of his argument boils down to “we must avoid a disastrous AI arms race by racing faster than our enemies to ASI”, but of course it is unclear whether an “AI arms race” would even exist if nobody were talking about an “AI arms race”. That is, just following incentives and coordinating rationally with their competitors.
      
      There’s also obviously the classic “AGI will likely end the world, thus I should invest in / work on it since if it doesn’t I’ll be rich, therefore AGI is more likely to end the world” self-fulfilling prophesy that has been a scourge on our field since the founding of DeepMind.
    - Julian Bradshaw 6 Jun 2024 16:54 UTC
      6 points
      0
      Parent
      Hm, I was interpreting ‘pulls towards existential catastrophe’ as meaning Leopold’s map mismatches the territory because it overrates the chance of existential catastrophe.
      If the argument is instead “Leopold publishing his map increases the chance of existential catastrophe” (by charging race dynamics, for example) then I agree that’s plausible. (Though I don’t think the choice to publish it was inexcusable—the effects are hard to predict, and there’s much to be said for trying to say true things.)
      If the argument is “following Leopold’s plan likely leads to existential catastrophe”, same opinion as above.
      - Nathan Helm-Burger 6 Jun 2024 17:42 UTC
        2 points
        2
        Parent
        Oh huh, I hadn’t even considered that interpretation. Personally, I think Leopold’s key error is in underrating how soon we will get to AGI if we continue as we have been, and in not thinking that that is as dangerous an achievement as I think it is.
        So, if your interpretation of ‘overrates chance of existential catastrophe’ is correct, I am of the opposite opinion. Seems like Leopold expects we can make good use of AGI without a bunch more alignment. I think we’ll just doom ourselves if we try to use it.
        Linch 6 Jun 2024 22:44 UTC
        4 points
        0
        Parent
        Personally, I think Leopold’s key error is in underrating how soon we will get to AGI if we continue as we have been
        
        His modal time-to-AGI is like 2027, with a 2-3 year intelligence explosion afterwards before humanity is ~ irrelevant.
        and in not thinking that that is as dangerous an achievement as I think it is.
        Yeah this seems likely.
        Nathan Helm-Burger 7 Jun 2024 12:27 UTC
        10 points
        8
        Parent
        Yes, and my modal time-to-AGI is late 2025 / early 2026. I think we’re right on the brink of a pre-AGI recursive self-improvement loop which will quickly rocket us past AGI. I think we are already in a significant compute overhang and data overhang. In other words, that software improvements alone can be more than sufficient. In other words, I am concerned.
        Ann 7 Jun 2024 13:09 UTC
        3 points
        0
        Parent
        The difference between these two estimates feels like it can be pretty well accounted for by reasonable expected development friction for prototype-humanish-level self-improvers, who will still be subject to many (minus some) of the same limitations that prevent “9 woman from growing a baby in a month”. You can predict they’ll be able to lubricate more or less of that, but we can’t currently strictly scale project speeds by throwing masses of software engineers and money at it.
        Nathan Helm-Burger 7 Jun 2024 15:23 UTC
        3 points
        1
        Parent
        I believe you are correct about the importance of taking these phenomena into account: indivisibility of certain serial tasks, coordination overhead of larger team sizes. I do think that my model takes these into account.
        
        It’s certainly possible that my model is wrong. I feel like there’s a lot of uncertainty in many key variables, and likely I have overlooked things. The phenomena you point out don’t happen to be things that I neglected to consider though.
        Ann 7 Jun 2024 16:33 UTC
        1 point
        0
        Parent
        I understand—my point is more that the difference between these two positions could be readily explained by you being slightly more optimistic in estimated task time when doing the accounting, and the voice of experience saying “take your best estimate of the task time, and double it, and that’s what it actually is”.
  - Simon Lermen 5 Jun 2024 19:13 UTC
    12 points
    10
    Parent
    One example: Leopold spends a lot of time talking about how we need to beat China to AGI and even talks about how we will need to build robo armies. He paints it as liberal democracy against the CCP. Seems that he would basically burn timeline and accelerate to beat China. At the same time, he doesn’t really talk about his plan for alignment which kind of shows his priorities. I think his narrative shifts the focus from the real problem (alignment).
    
    This part shows some of his thinking. Dwarkesh makes some good counter points here, like how is Donald Trump having this power better than Xi.