handoflixue comments on AGI Ruin: A List of Lethalities

handoflixue 8 Jun 2022 23:16 UTC
5 points
−1
AF
There are probably not many civilizations that wait until 2022 to make this list, and yet survive.
I don’t think making this list in 1980 would have been meaningful. How do you offer any sort of coherent, detailed plan for dealing with something when all you have is toy examples like Eliza?
We didn’t even have the concept of machine learning back then—everything computers did in 1980 was relatively easily understood by humans, in a very basic step-by-step way. Making a 1980s computer “safe” is a trivial task, because we hadn’t yet developed any technology that could do something “unsafe” (i.e. beyond our understanding). A computer in the 1980s couldn’t lie to you, because you could just inspect the code and memory and find out the actual reality.
What makes you think this would have been useful?
Do we have any historical examples to guide us in what this might look like?
- Rob Bensinger 12 Jun 2022 3:10 UTC
  LW: 17 AF: 7
  8
  AF Parent
  I think most worlds that successfully navigate AGI risk have properties like:
  - AI results aren’t published publicly, going back to more or less the field’s origin.
  - The research community deliberately steers toward relatively alignable approaches to AI, which includes steering away from approaches that look like ‘giant opaque deep nets’.
    This means that you need to figure out what makes an approach ‘alignable’ earlier, which suggests much more research on getting de-confused regarding alignable cognition.
    Many such de-confusions will require a lot of software experimentation, but the kind of software/ML that helps you learn a lot about alignment as you work with it is itself a relatively narrow target that you likely need to steer towards deliberately, based on earlier, weaker deconfusion progress. I don’t think having DL systems on hand to play with has helped humanity learn much about alignment thus far, and by default, I don’t expect humanity to get much more clarity on this before AGI kills us.
  - Researchers focus on trying to predict features of future systems, and trying to get mental clarity about how to align such systems, rather than focusing on ‘align ELIZA’ just because ELIZA is the latest hot new thing. Make and test predictions, back-chain from predictions to ‘things that are useful today’, and pick actions that are aimed at steering — rather than just wandering idly from capabilities fad to capabilities fad.
    (Steering will often fail. But you’ll definitely fail if you don’t even try. None of this is easy, but to date humanity hasn’t even made an attempt.)
  - In this counterfactual world, deductive reasoners and expert systems were only ever considered a set of toy settings for improving our intuitions, never a direct path to AGI.
    (I.e., the civilization was probably never that level of confused about core questions like ‘how much of cognition looks like logical deduction?’; their version of Aristotle or Plato, or at least Descartes, focused on quantitative probabilistic reasoning. It’s an adequacy red flag that our civilization was so confused about so many things going into the 20th century.)
  To me, all of this suggests a world where you talk about alignment before you start seeing crazy explosions in capabilities. I don’t know what you mean by “we didn’t even have the concept of machine learning back then”, but I flatly don’t buy that the species that landed on the Moon isn’t capable of generating a (more disjunctive version of) the OP’s semitechnical concerns pre-AlexNet.
  You need the norm of ‘be able to discuss things before you have overwhelming empirical evidence’, and you need the skill of ‘be good at reasoning about such things’, in order to solve alignment at all; so it’s a no-brainer that not-wildly-incompetent civilizations at least attempt literally any of this.
  - Thomas Kwa 4 Oct 2022 0:15 UTC
    5 points
    0
    Parent
    “most worlds that successfully navigate AGI risk” is kind of a strange framing to me.
    For one thing, it represents p(our world | success) and we care about p(success | our world). To convert between the two you of course need to multiply by p(success) / p(our world). What’s the prior distribution of worlds? This seems underspecified.
    For another, using the methodology “think about whether our civilization seems more competent than the problem is hard” or “whether our civilization seems on track to solve the problem” I might have forecast nuclear annihilation (not sure about this).
    The methodology seems to work when we’re relatively certain about the level of difficulty on the mainline, so if I were more sold on that I would believe this more. It would still feel kind of weird though.
- Vaniver 9 Jun 2022 18:13 UTC
  6 points
  0
  Parent
  I don’t think making this list in 1980 would have been meaningful. How do you offer any sort of coherent, detailed plan for dealing with something when all you have is toy examples like Eliza?
  I mean, I think many of the computing pioneers ‘basically saw’ AI risk. I noted some surprise that IJ Good didn’t write the precursor to this list in 1980, and apparently Wikipedia claims there was an unpublished statement in 1998 about AI x-risk; it’d be interesting to see what it contains and how much it does or doesn’t line up with our modern conception of why the problem is hard.
  - Zack_M_Davis 10 Jun 2022 19:32 UTC
    25 points
    3
    Parent
    The historical figures who basically saw it (George Eliot 1879: “will the creatures who are to transcend and finally supersede us be steely organisms [...] performing with infallible exactness more than everything that we have performed with a slovenly approximativeness and self-defeating inaccuracy?”; Turing 1951: “At some stage therefore we should have to expect the machines to take control”) seem to have done so in the spirit of speculating about the cosmic process. The idea of coming up with a plan to solve the problem is an additional act of audacity; that’s not really how things have ever worked so far. (People make plans about their own lives, or their own businesses; at most, a single country; no one plans world-scale evolutionary transitions.)
    - Andrew McKnight 1 Aug 2022 21:52 UTC
      5 points
      1
      Parent
      I’m tempted to call this a meta-ethical failure. Fatalism, universal moral realism, and just-world intuitions seem to be the underlying implicit hueristics or principals that would cause this “cosmic process” thought-blocker.