Rob Bensinger comments on MIRI announces new “Death With Dignity” strategy

Rob Bensinger 2 Apr 2022 18:23 UTC
75 points
0
+1 for asking the 101-level questions! Superintelligence, “AI Alignment: Why It’s Hard, and Where to Start”, “There’s No Fire Alarm for Artificial General Intelligence”, and the “Security Mindset” dialogues (part one, part two) do a good job of explaining why people are super worried about AGI.
“There’s no hope for survival” is an overstatement; the OP is arguing “successfully navigating AGI looks very hard, enough that we should reconcile ourselves with the reality that we’re probably not going to make it”, not “successfully navigating AGI looks impossible / negligibly likely, such that we should give up”.
If you want specific probabilities, here’s a survey I ran last year: https://www.lesswrong.com/posts/QvwSr5LsxyDeaPK5s/existential-risk-from-ai-survey-results. Eliezer works at MIRI (as do I), and MIRI views tended to be the most pessimistic.
- Adam Zerner 3 Apr 2022 2:18 UTC
  34 points
  Parent
  
  +1 for asking the 101-level questions!
  
  Yes! Very much +1! I’ve been hanging around here for almost 10 years and am getting value from the response to the 101-level questions.
  
  Superintelligence, “AI Alignment: Why It’s Hard, and Where to Start”, “There’s No Fire Alarm for Artificial General Intelligence”, and the “Security Mindset” dialogues (part one, part two) do a good job of explaining why people are super worried about AGI.
  
  Honestly, the Wait But Why posts are my favorite and what I would recommend to a newcomer. (I was careful to say things like “favorite” and “what I’d recommend to a newcomer” instead of “best”.)
  
  Furthermore, I feel like Karolina and people like them who are asking this sort of question deserve an answer that doesn’t require an investment of multiple hours of effortful reading and thinking. I’m thinking something like three paragraphs. Here is my attempt at one. Take it with the appropriate grain of salt. I’m just someone who’s hung around the community for a while but doesn’t deal with this stuff professionally.
  
  Think about how much smarter humans are than dogs. Our intelligence basically gives us full control over them. We’re technologically superior. Plus we understand their psychology and can manipulate them into doing what we want pretty reliably. Now take that and multiply by, uh, a really big number. An AGI would be wayyyyy smarter than us.
  
  Ok, but why do we assume there will be this super powerful AGI? Well, first of all, if you look at surveys, it looks like pretty much everyone does in fact believe it. It’s not something that people disagree on. They just disagree on when exactly it will happen. 10 years? 50 years? 200 years? But to address the question of how, well, the idea is that the smarter a thing is, the more capable it is of improving it’s own smartness. So suppose something starts off with 10 intelligence points and tries to improve and goes to 11. At 11 it’s more capable of improving than it was at 10, so it goes to a 13. +2 instead of +1. At 13 it’s even more capable of improving, so it goes to a 16. Then a 20. Then a 25. So on and so forth. Accelerating growth is powerful.
  
  Ok, so suppose we have this super powerful thing that is inevitable and is going to have complete control over us and the universe. Why can’t we just program the thing from the start to be nice? This is the alignment problem. I don’t have the best understanding of it tbh, but I like to think about the field of law and contracts. Y’know how those terms of service documents are so long and no one reads them? It’s hard to specify all of the things you want upfront. “Oh yeah, please don’t do A. Oh wait I forgot B, don’t do that either. Oh and C, yeah, that’s another important one.” Even I understand that this is a pretty bad description of the alignment problem, but hopefully it at least serves the purpose of providing some amount of intuition.
  
  “There’s no hope for survival” is an overstatement; the OP is arguing “successfully navigating AGI looks very hard, enough that we should reconcile ourselves with the reality that we’re probably not going to make it”, not “successfully navigating AGI looks impossible / negligibly likely, such that we should give up”.
  
  Really? That wasn’t my read on it. Although I’m not very confident in my read on it. I came away from the post feeling confused.
  
  For example, 0% was mentioned in a few places, like:
  
  When Earth’s prospects are that far underwater in the basement of the logistic success curve, it may be hard to feel motivated about continuing to fight, since doubling our chances of survival will only take them from 0% to 0%.
  
  Was that April Fools context exaggeration? My understanding is that what Vaniver said here is accurate: that Eliezer is saying that we’re doomed, but that he’s saying it on April Fools so that people who don’t want to believe it for mental health reasons (nervously starts raising hand) have an out. Maybe that’s not accurate though? Maybe it’s more like “We’re pretty doomed but I’m exaggerating/joking about how doomed we are because it’s April Fools Day.”
  What links here?
  - Adam Zerner's comment on MIRI announces new “Death With Dignity” strategy by Eliezer Yudkowsky (3 Apr 2022 22:42 UTC; 4 points)
  - Rob Bensinger 3 Apr 2022 3:41 UTC
    14 points
    Parent
    I think the Wait But Why posts are quite good, though I usually link them alongside Luke Muehlhauser’s reply.
    For example, 0% was mentioned in a few places
    It’s obviously not literally 0%, and the post is explicitly about ‘how do we succeed?’, with a lengthy discussion of the possible worlds where we do in fact succeed:
    [...] The surviving worlds look like people who lived inside their awful reality and tried to shape up their impossible chances; until somehow, somewhere, a miracle appeared—the model broke in a positive direction, for once, as does not usually occur when you are trying to do something very difficult and hard to understand, but might still be so—and they were positioned with the resources and the sanity to take advantage of that positive miracle, because they went on living inside uncomfortable reality. Positive model violations do ever happen, but it’s much less likely that somebody’s specific desired miracle that “we’re all dead anyways if not...” will happen; these people have just walked out of the reality where any actual positive miracles might occur. [...]
    The whole idea of ‘let’s maximize dignity’ is that it’s just a reframe of ‘let’s maximize the probability that we survive and produce a flourishing civilization’ (the goal of the reframe being to guard against wishful thinking):
    Obviously, the measuring units of dignity are over humanity’s log odds of survival—the graph on which the logistic success curve is a straight line. [...] But if enough people can contribute enough bits of dignity like that, wouldn’t that mean we didn’t die at all? Yes, but again, don’t get your hopes up.
    Hence:
    Q1: Does ‘dying with dignity’ in this context mean accepting the certainty of your death, and not childishly regretting that or trying to fight a hopeless battle?
    Don’t be ridiculous. How would that increase the log odds of Earth’s survival?
    I dunno what the ‘0%’ means exactly, but it’s obviously not literal. My read of it was something like ‘low enough that it’s hard enough to be calibrated about exactly how low it is’, plus ‘low enough that you can make a lot of progress and still not have double-digit success probability’.
    - Adam Zerner 3 Apr 2022 22:42 UTC
      4 points
      Parent
      
      I think the Wait But Why posts are quite good, though I usually link them alongside Luke Muehlhauser’s reply.
      
      Cool! Good to get your endorsement. And thanks for pointing me to Muehlhauser’s reply. I’ll check it out.
      
      I dunno what the ‘0%’ means exactly, but it’s obviously not literal. My read of it was something like ‘low enough that it’s hard enough to be calibrated about exactly how low it is’, plus ‘low enough that you can make a lot of progress and still not have double-digit success probability’.
      
      Ok, yeah, that does sound pretty plausible. It still encompasses a pretty wide range though. Like, it could mean one in a billion, I guess? There is a grim tone here. And Eliezer has spoken about his pessimism elsewhere with a similarly grim tone. Maybe I am overreacting to that. I dunno. Like Raemon, I am still feeling confused.
      
      It’s definitely worth noting though that even at a low number like one in a billion, it is still worth working on for sure. And I see that Eliezer believes this as well. So in that sense I take back what I said in my initial comment.
      - Rob Bensinger 6 Apr 2022 5:22 UTC
        6 points
        Parent
        One in a billion seems way, way, way too low to me. (Like, I think that’s a crazy p(win) to have, and I’d be shocked beyond shocked if Eliezer’s p(win) were that low. Like, if he told me that I don’t think I’d believe him.)
        Adam Zerner 6 Apr 2022 16:32 UTC
        3 points
        Parent
        Ok, that’s good to hear! Just checking, do you feel similarly about 1 in 100k?
        Rob Bensinger 7 Apr 2022 2:10 UTC
        6 points
        Parent
        No, that’s a lot lower than my probability but it doesn’t trigger the same ‘that can’t be right, we must be miscommunicating somehow’ reaction.
        Adam Zerner 7 Apr 2022 5:23 UTC
        3 points
        Parent
        I see. Thanks for the response.
    - Raemon 3 Apr 2022 3:48 UTC
      4 points
      Parent
      FYI I am finding myself fairly confused about the “0%” line. I don’t see a reason not to take Eliezer at his word that he meant 0%. “Obviously not literal” feels pretty strong, if he meant a different thing I’d prefer the post say whatever he meant.
      What links here?
      Adam Zerner's comment on MIRI announces new “Death With Dignity” strategy by Eliezer Yudkowsky (3 Apr 2022 22:42 UTC; 4 points)
      - habryka 4 Apr 2022 3:56 UTC
        17 points
        Parent
        Eliezer seemed quite clear to me when he said (paraphrased) “we are on the left side of the logistical success curve, where success is measured in significant digits of leading 0s you are removing from your probability of success”. The whole post seems to clearly imply that Eliezer thinks that marginal dignity is possible, which he defines as a unit of logistical movement on the probability of success. This clearly implies the probability is not literally 0, but it does clearly argue that the probability (on a linear scale) can be rounded to 0.
        Raemon 4 Apr 2022 4:47 UTC
        7 points
        Parent
        Mostly I had no idea if he meant like 0.1, 0.001, or 0.00000001. Also not sure if he’s more like “survival chance is 0%, probably, with some margin of error, maybe it’s 1%”, or “no, I’m making the confident claim that it’s more like 0.0001″
        (This was combined with some confusion about Nate Soares saying something in the vein of “if you don’t whole heartedly believe in your plans, you should multiply their EV by 0, and you’re not supposed to pick the plan who’s epsilon value is “least epsilon”)
        Also, MIRI isn’t (necessarily) a hive mind, so not sure if Rob, Nate or Abram actually share the same estimate of how doomed we are as Eliezer.
        AnnaSalamon 4 Apr 2022 5:12 UTC
        8 points
        Parent
        
        Also, MIRI isn’t (necessarily) a hive mind, so not sure if Rob, Nate or Abram actually share the same estimate of how doomed we are as Eliezer.
        
        Indeed, I expect that the views of at least some individuals working at MIRI vary considerably.
        
        In some ways, the post would seem more accurate to me if it had the Onion-esque headline: Eliezer announces on MIRI’s behalf that “MIRI adopts new ‘Death with Dignity’ strategy.”
        
        Still, I love the post a lot. Also, Eliezer has always been pivotal in MIRI.
        Rob Bensinger 6 Apr 2022 5:03 UTC
        8 points
        Parent
        The five MIRI responses in my AI x-risk survey (marked with orange dots) show a lot of variation in P(doom):
        (Albeit it’s still only five people; maybe a lot of MIRI optimists didn’t reply, or maybe a lot of pessimists didn’t, for some reason.)
        What links here?
        MondSemmel's comment on Beware boasting about non-existent forecasting track records by Jotto999 (22 May 2022 8:25 UTC; 11 points)
      - abramdemski 3 Apr 2022 18:42 UTC
        17 points
        Parent
        Personally, I took it to be 0% within an implied # of significant digits, perhaps in the ballpark of three.