Carl Feynman comments on AGI Ruin: A List of Lethalities

Carl Feynman 15 Nov 2023 19:02 UTC
LW: 2 AF: 1
−5
AF
Inner alignment failure is a phenomenon that has happened in existing AI systems, weak as they are. So we know it can happen. We are on track to build many superhuman AI systems. Unless something unexpectedly good happens, eventually we will build one that has a failure of inner alignment. And then it will kill us all. Does the probability of any given system failing inner alignment really matter?
- [deactivated] 15 Nov 2023 20:35 UTC
  1 point
  −2
  AF Parent
  
  We are on track to build many superhuman AI systems. Unless something unexpectedly good happens, eventually we will build one that has a failure of inner alignment. And then it will kill us all. Does the probability of any given system failing inner alignment really matter?
  
  Yes, because if the first superhuman AGI is aligned, and if it performs a pivotal act to prevent misaligned AGI from being created, then we will avert existential catastrophe.
  
  If there is a 99.99% chance of that happening, then we should be quite sanguine about AI x-risk. On the other hand, if there is only a 0.01% chance, then we should be very worried.
  - Tapatakt 15 Nov 2023 20:45 UTC
    LW: 1 AF: -1
    −3
    AF Parent
    It’s hard to guess, but it happened when the only one known to us general intelligence was created by a hill-climbing process.
    - TurnTrout 15 Nov 2023 21:23 UTC
      LW: 6 AF: 3
      −6
      AF Parent
      I think it’s inappropriate to call evolution a “hill-climbing process” in this context, since those words seem optimized to sneak in parallels to SGD. Separately, I think that evolution is a bad analogy for AGI training.
    - [deactivated] 15 Nov 2023 20:59 UTC
      1 point
      0
      Parent
      This seems super important to the argument! Do you know if it’s been discussed in detail anywhere else?