ryan_greenblatt comments on But why would the AI kill us?

ryan_greenblatt 18 Apr 2023 17:27 UTC
26 points
18
If you condition on misaligned AI takeover, my current (extremely rough) probabilities are:
- 50% chance the AI kills > 99% of people
- Conditional on killing >99% of people, ²⁄₃ chance the AI kills literally everyone
Edit: I now think mass death and extinction are notably less likely than these probabilites. Perhaps more like 40% on >50% of people killed and 20% on >99% of people killed.

By ‘kill’ here I’m not including things like ‘the AI cryonically preserves everyone’s brains and then revives people later’. I’m also not including cases where the AI lets everyone live a normal human lifespan but fails to grant immortality or continue human civilization beyond this point.

My beliefs here are due to a combination of causal/acausal trade arguments as well as some intuitions that it’s likely that AIs will be slightly cooperative/nice for decision theory reasons (ECL mostly) or just moral reasons.

To be clear, it seems totally insane to depend on this or think that this makes the situation ok. Further, note that I think it’s reasonably likely that there is a bloody and horrible conflict between AIs and humanity (it just seems unlikely that this conflict kills >99% of people, so the question does not come down to conflict). Edit: seems unclear more than seems unlikely. I think conflict between AIs an humans killing >99% is plausible but not enough that I’d be confident humans die

Note that the trade and niceness bar might be extremely low as you discuss here:

Me: Remember that it still needs to get more of what it wants, somehow, on its own superintelligent expectations. Someone still needs to pay it. There aren’t enough simulators above us that care enough about us-in-particular to pay in paperclips. There are so many things to care about! Why us, rather than giant gold obelisks? The tiny amount of caring-ness coming down from the simulators is spread over far too many goals; it’s not clear to me that “a star system for your creators” outbids the competition, even if star systems are up for auction.

Maybe some friendly aliens somewhere out there in the Tegmark IV multiverse have so much matter and such diminishing marginal returns on it that they’re willing to build great paperclip-piles (and gold-obelisk totems and etc. etc.) for a few spared evolved-species. But if you’re going to rely on the tiny charity of aliens to construct hopeful-feeling scenarios, why not rely on the charity of aliens who anthropically simulate us to recover our mind-states… or just aliens on the borders of space in our universe, maybe purchasing some stored human mind-states from the UFAI (with resources that can be directed towards paperclips specifically, rather than a broad basket of goals)?

Yeah, all of these scenerios with aliens seem sufficiently plausible to me that we should expect the AI to keep humans alive if it’s very cheap to do so (which is what I expect).

Note that both common sense moral views and views like UDASSA imply that you should particularly value currently alive humans over future beings. I find this position somewhat implausible and none of these views seem stable under reflection. Regardless it does hint at the idea that future humans or aliens might place a considerable amount of value on keeping human civilization going. If you don’t particularly value currently alive humans, then I agree that you just do the thing you’d like in your universe or you trade for asks other than keeping humans alive right now.

I also think a relatively strong version of acausal trade arguments seems plausible. Specifically it seems plausible that after the dust settles the universe looks basically similar if an AI takes over vs humans keeping control. (Note that this doesn’t imply alignment is unimportant: our alignment work is directly logically entangled with total resources. For those worried about currently alive humans, you should possibly be very worried about what happens before the dust settles...).

Overall, I’m confused why you seem so insistent on making such a specific technical point which seems insanely sensitive to various hard to predict details about the future. Further, it depends on rounding errors in the future resource allocation which seems to make the situation particularly sensitive to random questions about how aliens behave etc.
What links here?
- Tom Davidson 13 May 2023 5:14 UTC
  1 point
  0
  Parent
  Why are you at 50% ai kills >99% ppl given the points you make in the other direction?
  - ryan_greenblatt 14 May 2023 19:32 UTC
    1 point
    0
    Parent
    My probabilities are very rough, but I’m feeling more like ¹⁄₃ ish today after thinking about it a bit more. Shrug.
    
    As far as reasons for it being this high:
    
    Conflict seems plausible to get to this level of lethality (see edit, I think I was a bit unclear or incorrect)
    AIs might not care about acausal trade considerations before too late (seems unclear)
    Future humans/AIs/aliens might decide it isn’t morally important to particularly privilege currently alive humans
    
    Generally, I’m happy to argue for ‘we should be pretty confused and there are a decent number of good reasons why AIs might keep humans alive’. I’m not confident in survival overall though...