Eliezer Yudkowsky comments on AGI Ruin: A List of Lethalities

Eliezer Yudkowsky 10 Jun 2022 5:15 UTC
LW: 130 AF: 34
66
AF
Consider my vote to be placed that you should turn this into a post, keep going for literally as long as you can, expand things to paragraphs, and branch out beyond things you can easily find links for.
(I do think there’s a noticeable extent to which I was trying to list difficulties more central than those, but I also think many people could benefit from reading a list of 100 noncentral difficulties.)
- DPiepgrass 21 Jul 2022 5:29 UTC
  14 points
  16
  Parent
  I do think there’s a noticeable extent to which I was trying to list difficulties more central than those
  Probably people disagree about which things are more central, or as evhub put it:
  Every time anybody writes up any overview of AI safety, they have to make tradeoffs [...] depending on what the author personally believes is most important/relevant to say
  Now FWIW I thought evhub was overly dismissive of (4) in which you made an important meta-point:
  EY: 4. We can’t just “decide not to build AGI” because GPUs are everywhere, and knowledge of algorithms is constantly being improved and published; 2 years after the leading actor has the capability to destroy the world, 5 other actors will have the capability to destroy the world. The given lethal challenge is to solve within a time limit, driven by the dynamic in which, over time, increasingly weak actors with a smaller and smaller fraction of total computing power, become able to build AGI and destroy the world. Powerful actors all refraining in unison from doing the suicidal thing just delays this time limit—it does not lift it [...]
  evhub: This is just answering a particular bad plan.
  But I would add a criticism of my own, that this “List of Lethalities” somehow just takes it for granted that AGI will try to kill us all without ever specifically arguing that case. Instead you just argue vaguely in that direction, in passing, while making broader/different points:
  an AGI strongly optimizing on that signal will kill you, because the sensory reward signal was not a ground truth about alignment (???)
  All of these kill you if optimized-over by a sufficiently powerful intelligence, because they imply strategies like ‘kill everyone in the world using nanotech to strike before they know they’re in a battle, and have control of your reward button forever after’. (I guess that makes sense)
  If you perfectly learn and perfectly maximize the referent of rewards assigned by human operators, that kills them. (???)
  Perhaps you didn’t bother because your audience is meant to be people who already believe this? I would at least expect to see it in the intro: “-5. unaligned superintelligences tend to try to kill everyone, here’s why <link>.… −4. all the most obvious proposed solutions to (-5) don’t work, here’s why <link>”.