Cédric comments on AGI Ruin: A List of Lethalities

Cédric 8 Jun 2022 2:34 UTC
8 points
2
Imagine we’re all in a paddleboat paddling towards a waterfall. Inside the paddleboat is everyone but only a relatively small number of them are doing the paddling. Of those paddling, most are aware of the waterfall ahead but for reasons beyond my comprehension, decide to paddle on anyway. A smaller group of paddlers have realised their predicament and have decided to stop paddling and start building wings onto the paddleboat so that when the paddleboat inevitably hurtles off the waterfall, it might fly.

It seems to me like the most sensible course of action is to stop paddling until the wings are built and we know for sure they’re going to work. So why isn’t the main strategy definitively proving that we’re heading towards the waterfall and raising awareness until the culture has shifted enough that paddling is taboo? With this strategy, even if the paddling doesn’t stop, at least it buys time for the wings to be constructed. Trying to get people to stop paddling seems like a higher probability of success than wing building + increases the probability of success of wing building as it buys time.

I suspect that part of the reason for just focusing on the wings is the desire to reap the rewards of aligned AGI within our lifetimes. The clout of being the ones who did the final work. The immortality. The benefits that we can’t yet imagine etc etc. Maybe infinite rewards justifies infinite risk but it does not apply in this case because we can still get the infinite rewards without so much risk if we just wait until the risks are eliminated.
- JBlack 8 Jun 2022 2:51 UTC
  5 points
  2
  Parent
  Maybe infinite rewards justifies infinite risk but it does not apply in this case because we can still get the infinite rewards without so much risk if we just wait until the risks are eliminated.
  If eliminating the risk takes 80+ years and AI development is paused for that to complete, then it is very likely that everyone currently reading this comment will die before it is finished. From a purely selfish point of view it can easily make sense for a researcher to continue even if they fully believe that there is a 90%+ chance that AI will kill them. Waiting will also almost certainly kill them, and they won’t get any of those infinite rewards anyway.
  Being less than 90% convinced that AI will kill them just makes it even more attractive. Hyperbolic discounting makes it even more attractive still.
  - Rob Bensinger 8 Jun 2022 10:21 UTC
    13 points
    8
    Parent
    It’s not obvious to me that it takes 80+ years to get double-digit alignment success probabilities, from where we are. Waiting a few decades strikes me as obviously smart from a selfish perspective; e.g., AGI in 2052 is a lot selfishly better than AGI in 2032, if you’re under age 50 today.
    But also, I think the current state of humanity’s alignment knowledge is very bad. I think your odds of surviving into the far future are a lot higher if you die in a few decades and get cryopreserved and then need to hope AGI works out in 80+ years, than if you survive to see AGI in the next 20 years.
    - JBlack 9 Jun 2022 3:49 UTC
      1 point
      0
      Parent
      True, you can make use of the Gompertz curve to get marginal benefit from waiting a bit while you still have a low marginal probability of non-AGI death.
      So we only need to worry about researchers who have lower estimates of unaligned AGI causing their death, or who think that AGI is a long way out and want to hurry it up now.
    - Noosphere89 8 Jun 2022 14:04 UTC
      1 point
      8
      Parent
      Unfortunately, cryopreservation isn’t nearly as reliable as needed in order to assume immortality is achieved. While we’ve gotten better at it, it still relies on toxic chemicals in order to vitrify the brain.
      - Rob Bensinger 8 Jun 2022 19:46 UTC
        5 points
        5
        Parent
        I’m not saying it’s reliable!! I’m saying the odds of alignment success in the next 20 years currently looks even worse.
  - Cédric 8 Jun 2022 3:35 UTC
    2 points
    1
    Parent
    Well then let’s use hyperbolic discounting to our advantage. If we make paddling sufficiently taboo, the social punishment of paddling will outweigh the rewards of potentially building AGI in the minds of the selfish researchers.
- lc 8 Jun 2022 2:45 UTC
  2 points
  1
  Parent
  Dunno what that last sentence was but generally I agree.
  At the same time: be the change you wish to see in the world. Don’t just tell people who are already working on it they should be doing something else. Actually do that raising the alarm thing first.
  - Cédric 8 Jun 2022 3:28 UTC
    5 points
    0
    Parent
    What I’m doing is trying to help with the wings by throwing some money at MIRI. I am also helping with the stopping/slowing of paddling by sharing my very simple reasoning about why that’s the most sensible course of action. Hopefully the simple idea will spread and have some influence.
    
    To be honest, I am not willing to invest that much into this as I have other things I am working on (sounds so insane to type that I am not willing to invest much into preventing the doom of everyone and everything). Anyway, there are many like me who are willing to help but only if the cost is low so if you have any ideas of what people like me could do to shift the probabilities a bit, let me know.
    - Kenny 9 Jun 2022 16:53 UTC
      1 point
      0
      Parent
      Sadly, it doesn’t seem like there’s any low-hanging fruit that would even “shift the probabilities a bit”.
      
      Most people seem, if anything, anti-receptive to any arguments about this, because, e.g. it’s ‘too weird’.
      
      And I too feel like this describes myself:
      
      To be honest, I am not willing to invest that much into this as I have other things I am working on (sounds so insane to type that I am not willing to invest much into preventing the doom of everyone and everything).
      
      I’m thinking – very tentatively (sadly) – about maybe looking into my own personal options for some way to help, but I’m also distracted by “other things”.
      
      I find this – even people that are (at least somewhat) convinced still not being willing to basically ‘throw everything else away’ (up to the limit of what would impair our abilities to actually help, if not succeed) to be particularly strong evidence that this might be overall effectively impossible.