Rob Bensinger comments on MIRI 2024 Communications Strategy

Rob Bensinger 2 Jun 2024 3:12 UTC
16 points
0
Two things:
- For myself, I would not feel comfortable using language as confident-sounding as “on the default trajectory, AI is going to kill everyone” if I assigned (e.g.) 10% probability to “humanity [gets] a small future on a spare asteroid-turned-computer or an alien zoo or maybe even star”. I just think that scenario’s way, way less likely than that.
  - I’d be surprised if Nate assigns 10+% probability to scenarios like that, but he can speak for himself. 🤷‍♂️
  - I think some people at MIRI have significantly lower p(doom)? And I don’t expect those people to use language like “on the default trajectory, AI is going to kill everyone”.
- I agree with you that there’s something weird about making lots of human-extinction-focused arguments when the thing we care more about is “does the cosmic endowment get turned into paperclips”? I do care about both of those things, an enormous amount; and I plan to talk about both of those things to some degree in public communications, rather than treating it as some kind of poorly-kept secret that MIRI folks care about whether flourishing interstellar civilizations get a chance to exist down the line. But I have this whole topic mentally flagged as a thing to be thoughtful and careful about, because it at least seems like an area that contains risk factors for future deceptive comms. E.g., if we update later to expecting the cosmic endowment to be wasted but all humans not dying, I would want us to adjust our messaging even if that means sacrificing some punchiness in our policy outreach.
  - Currently, however, I think the particular scenario “AI keeps a few flourishing humans around forever” is incredibly unlikely, and I don’t think Eliezer, Nate, etc. would say things like “this has a double-digit probability of happening in real life”? And, to be honest, the idea of myself and my family and friends and every other human being all dying in the near future really fucks me up and does not seem in any sense OK, even if (with my philosopher-hat on) I think this isn’t as big of a deal as “the cosmic endowment gets wasted”.
  - So I don’t currently feel bad about emphasizing a true prediction (“extremely likely that literally all humans literally nonconsensually die by violent means”), even though the philosophy-hat version of me thinks that the separate true prediction “extremely likely 99+% of the potential value of the long-term future is lost” is more morally important than that. Though I do feel obliged to semi-regularly mention the whole “cosmic endowment” thing in my public communication too, even if it doesn’t make it into various versions of my general-audience 60-second AI risk elevator pitch.
- ryan_greenblatt 2 Jun 2024 3:41 UTC
  5 points
  1
  Parent
  Thanks, this is clarifying from my perspective.
  
  My remaining uncertainty is why you think AIs are so unlikely to keep humans around and treat them reasonably well (e.g. let them live out full lives).
  
  From my perspective the argument that it is plausible that humans are treated well [even if misaligned AIs end up taking over the world and gaining absolute power] goes something like this:
  - If it only cost >1/million of overall resources to keep a reasonable fraction of humans alive and happy, it’s reasonably likely that misaligned AIs with full control would keep humans alive and happy due to either:
    Acausal trade/decision theory
    The AI terminally caring at least a bit about being nice to humans (perhaps because it cares a bit about respecting existing nearby agents or perhaps because it has at least a bit of human like values).
  - It is pretty likely that it costs <1/million of overall resources (from the AI’s perspective) to keep a reaonable fraction of humans alive and happy. Humans are extremely keep to keep around asymptotically and I think it can be pretty cheap even initially, especially if you’re a very smart AI.
  (See links in my prior comment for more discussion.)
  
  (I also think the argument goes through for 1/billion, but I thought I would focus on the higher value for now.)
  
  Where do you disagree with this argument?