Nathan Helm-Burger comments on Former OpenAI Superalignment Researcher: Superintelligence by 2030

Nathan Helm-Burger 6 Jun 2024 12:09 UTC
16 points
8
The question of ‘pulls towards catastrophe’ doesn’t matter whether the author believes their work pulls towards catastrophe. The direction of the pull is in the eye of the reader. Therefore, you must evaluate whether Jan (or you, or I) believe that the futures which Leopold’s maps pull us toward will result in existential catastrophes. For a simplified explanation, imagine that Leopold is driving fast at night on a winding cliffside road, and his vision is obscured by a heads-up display of a map of his own devising. If his map directs him to take a left and he drives over the cliff edge… It doesn’t matter where Leopold thought he would end up, it matters where he got to. If you are his passenger, you should care more about where you think he’s navigation is likely to actually end you up at than about where Leopold believes that his navigation will end up.
- Garrett Baker 6 Jun 2024 18:23 UTC
  8 points
  7
  Parent
  I think this gets more tricky because of coordination. Leopold’s main effect is in selling maps, not using them. If his maps list a town in a particular location, which consumers and producers both travel to expecting a town, then his map has reshaped the territory and caused a town to exist.
  
  Pointing out one concrete dynamic here, most of his argument boils down to “we must avoid a disastrous AI arms race by racing faster than our enemies to ASI”, but of course it is unclear whether an “AI arms race” would even exist if nobody were talking about an “AI arms race”. That is, just following incentives and coordinating rationally with their competitors.
  
  There’s also obviously the classic “AGI will likely end the world, thus I should invest in / work on it since if it doesn’t I’ll be rich, therefore AGI is more likely to end the world” self-fulfilling prophesy that has been a scourge on our field since the founding of DeepMind.
- Julian Bradshaw 6 Jun 2024 16:54 UTC
  6 points
  0
  Parent
  Hm, I was interpreting ‘pulls towards existential catastrophe’ as meaning Leopold’s map mismatches the territory because it overrates the chance of existential catastrophe.
  If the argument is instead “Leopold publishing his map increases the chance of existential catastrophe” (by charging race dynamics, for example) then I agree that’s plausible. (Though I don’t think the choice to publish it was inexcusable—the effects are hard to predict, and there’s much to be said for trying to say true things.)
  If the argument is “following Leopold’s plan likely leads to existential catastrophe”, same opinion as above.
  - Nathan Helm-Burger 6 Jun 2024 17:42 UTC
    2 points
    2
    Parent
    Oh huh, I hadn’t even considered that interpretation. Personally, I think Leopold’s key error is in underrating how soon we will get to AGI if we continue as we have been, and in not thinking that that is as dangerous an achievement as I think it is.
    So, if your interpretation of ‘overrates chance of existential catastrophe’ is correct, I am of the opposite opinion. Seems like Leopold expects we can make good use of AGI without a bunch more alignment. I think we’ll just doom ourselves if we try to use it.
    - Linch 6 Jun 2024 22:44 UTC
      4 points
      0
      Parent
      Personally, I think Leopold’s key error is in underrating how soon we will get to AGI if we continue as we have been
      
      His modal time-to-AGI is like 2027, with a 2-3 year intelligence explosion afterwards before humanity is ~ irrelevant.
      and in not thinking that that is as dangerous an achievement as I think it is.
      Yeah this seems likely.
      - Nathan Helm-Burger 7 Jun 2024 12:27 UTC
        10 points
        8
        Parent
        Yes, and my modal time-to-AGI is late 2025 / early 2026. I think we’re right on the brink of a pre-AGI recursive self-improvement loop which will quickly rocket us past AGI. I think we are already in a significant compute overhang and data overhang. In other words, that software improvements alone can be more than sufficient. In other words, I am concerned.
        Ann 7 Jun 2024 13:09 UTC
        3 points
        0
        Parent
        The difference between these two estimates feels like it can be pretty well accounted for by reasonable expected development friction for prototype-humanish-level self-improvers, who will still be subject to many (minus some) of the same limitations that prevent “9 woman from growing a baby in a month”. You can predict they’ll be able to lubricate more or less of that, but we can’t currently strictly scale project speeds by throwing masses of software engineers and money at it.
        Nathan Helm-Burger 7 Jun 2024 15:23 UTC
        3 points
        1
        Parent
        I believe you are correct about the importance of taking these phenomena into account: indivisibility of certain serial tasks, coordination overhead of larger team sizes. I do think that my model takes these into account.
        
        It’s certainly possible that my model is wrong. I feel like there’s a lot of uncertainty in many key variables, and likely I have overlooked things. The phenomena you point out don’t happen to be things that I neglected to consider though.
        Ann 7 Jun 2024 16:33 UTC
        1 point
        0
        Parent
        I understand—my point is more that the difference between these two positions could be readily explained by you being slightly more optimistic in estimated task time when doing the accounting, and the voice of experience saying “take your best estimate of the task time, and double it, and that’s what it actually is”.