Rob Bensinger comments on MIRI announces new “Death With Dignity” strategy

Rob Bensinger 7 Apr 2022 1:21 UTC
83 points
1
(I ran this comment by Eliezer and Nate and they endorsed it.)
My model is that the post accurately and honestly represents Eliezer’s epistemic state (‘I feel super doomy about AI x-risk’), and a mindset that he’s found relatively motivating given that epistemic state (‘incrementally improve our success probability, without getting emotionally attached to the idea that these incremental gains will result in a high absolute success probability’), and is an honest suggestion that the larger community (insofar as it shares his pessimism) adopt the same framing for the sake of guarding against self-deception and motivated reasoning.
The parts of the post that are an April Fool’s Joke, AFAIK, are the title of the post, and the answer to Q6. The answer to Q6 is a joke because it’s sort-of-pretending the rest of the post is an April Fool’s joke. The title is a joke because “X’s new organizational strategy is ‘death with dignity’” sounds sort of inherently comical, and doesn’t really make sense (how is that a “strategy”? believing p(doom) is high isn’t a strategy, and adopting a specific mental framing device isn’t really a “strategy” either). (I’m even more confused by how this could be MIRI’s “policy”.)
In case it clarifies anything, here are some possible interpretations of ‘MIRI’s new strategy is “Death with Dignity”’, plus a crisp statement of whether the thing is true or false:
- A plurality of MIRI’s research leadership, adjusted for org decision-making weight, thinks humanity’s success probability is very low, and will (continue to) make org decisions accordingly. — True, though:
  - Practically speaking, I don’t think this is wildly different from a lot of MIRI’s past history. Eg., Nate’s stated view in 2014 (assuming FT’s paraphrase is accurate), before he became ED, was “there is only a 5 per cent chance of programming sufficient safeguards into advanced AI”.
    (Though I think there was at least one period of time in the intervening years where Nate had double-digit success probabilities for humanity — after the Puerto Rico conference and associated conversations, where he was impressed by the spirit of cooperation and understanding present and by how on-the-ball some key actors looked. He tells me that he later updated back downwards when the political situation degraded, and separately when he concluded the people in question weren’t that on-the-ball after all.)
  - MIRI is strongly in favor of its researchers building their own models and doing the work that makes sense to them; individual MIRI researchers’ choices of direction don’t require sign-off from Eliezer or Nate.
  - I don’t know exactly why Eliezer wrote a post like this now, but I’d guess the largest factors are roughly (1) that Eliezer and Nate have incrementally updated over the years from ‘really quite gloomy’ to ‘even gloomier’, (2) that they’re less confident about what object-level actions would currently best reduce p(doom), and (3) that as a consequence, they’ve updated a lot toward existential wins being likelier if the larger community moves toward having much more candid and honest conversations, and generally produces more people who are thinking exceptionally clearly about the problem.
- Everyone on MIRI’s research team thinks our success probability is extremely low (say, below 5%). — False, based on a survey I ran a year ago. Only five MIRI researchers responded, so the sample might skew much more negative or positive than the overall distribution of views at MIRI; but MIRI responses to Q2 were (66%, 70%, 70%, 96%, 98%). I also don’t think the range of views has changed a ton in the intervening year.
- MIRI will require (of present and/or future research staff) that they think in terms of “death with dignity”. — False, both in that MIRI isn’t in the business of dictating researchers’ P(doom) and in that MIRI isn’t in the business of dictating researchers’ motivational tools or framing devices.
- MIRI has decided to give up on reducing existential risk from AI. — False, obviously.
- MIRI is “locking in” pessimism as a core part of its org identity, such that it refuses to update toward optimism if the situation starts looking better. — False, obviously.
Other than the two tongue-in-cheek parts, AFAIK the post is just honestly stating Eliezer’s views, without any more hyperbole than a typical Eliezer post would have. E.g., the post is not “a preview of what might be needful to say later, if matters really do get that desperate”. Some parts of the post aren’t strictly literal (e.g., “0% probability”), but that’s because all of Eliezer’s posts are pretty colloquial, not because of a special feature of this post.
What links here?
- Evan_Gaensbauer 8 Apr 2022 0:45 UTC
  8 points
  1
  Parent
  Thank you for the detailed response. It helps significantly.
  The parts of the post that are an April Fool’s Joke, AFAIK, are the title of the post, and the answer to Q6. The answer to Q6 is a joke because it’s sort-of-pretending the rest of the post is an April Fool’s joke.
  It shouldn’t be surprising others are confused if this is your best guess about what the post means altogether.
  believing p(doom) is high isn’t a strategy, and adopting a specific mental framing device isn’t really a “strategy” either). (I’m even more confused by how this could be MIRI’s “policy”.)
  Most would probably be as confused as you are at the notion “dying with dignity” is a strategy. I was thinking the meaning of the title stripped of hyperbole was not a change in MIRI’s research agenda but some more “meta-level” organizational philosophy.
  I’m paraphrasing here, so correct me if I’m wrong, but some of the recent dialogues between Eliezer and other AI alignment researchers in the last several months contained statements from Eliezer like “We [at least Nate and Eliezer] don’t think what MIRI has been doing for the last few years will work, and we don’t have a sense of what direction to go now”, and “I think maybe most other approaches in AI alignment have almost no chance of making any progress on the alignment problem.”
  Maybe many people would have known better what Eliezer meant had they read the entirety of the post(s) in question. Yet the posts were so long and complicated Scott Alexander bothered to write a summary of only one of them and there are several more.
  As far as I’m aware, the reasoning motivating the kind of sentiments Eliezer expressed weren’t much explained elsewhere. Between the confusion and concern that has caused, and the ambiguity of the above post, that right now MIRI’s strategy might be in a position of (temporary) incoherence was apparently plausible enough to a significant minority of readers.
  The parts of your comment excerpted below are valuable and may even have saved MIRI a lot of work trying to deconfuse others had they been publicly stated at some point in the last few months:
  A plurality of MIRI’s research leadership, adjusted for org decision-making weight, thinks humanity’s success probability is very low, and will (continue to) make org decisions accordingly.
  
  MIRI is strongly in favor of its researchers building their own models and doing the work that makes sense to them; individual MIRI researchers’ choices of direction don’t require sign-off from Eliezer or Nate.
  They [at least Eliezer and Nate] updated a lot toward existential wins being likelier if the larger community moves toward having much more candid and honest conversations, and generally produces more people who are thinking exceptionally clearly about the problem.
  - Rob Bensinger 8 Apr 2022 2:12 UTC
    5 points
    0
    Parent
    It shouldn’t be surprising others are confused if this is your best guess about what the post means altogether.
    I don’t know what you mean by this—I don’t know what “this” is referring to in your sentence.
    As far as I’m aware, the reasoning motivating the kind of sentiments Eliezer expressed weren’t much explained elsewhere.
    I mean, the big dump of chat logs is trying to make our background models clearer to people so that we can hopefully converge more. There’s an inherent tension between ‘say more stuff, in the hope that it clarifies something’ versus ‘say less stuff, so that it’s an easier read’. Currently I think the best strategy is to err on the side of over-sharing and posting long things, and then rely on follow-up discussion, summaries, etc. to address the fact that not everyone has time to read everything.
    E.g., the three points you highlighted don’t seem like new information to me; I think we’ve said similar things publicly multiple times. But they can be new info to you, since you haven’t necessarily read the same resources I have.
    Since everyone has read different things and will have different questions, I think the best solution is for you to just ask about the stuff that strikes you as the biggest holes in your MIRI-map.
    I do think we’re overdue for a MIRI strategy post that collects a bunch of the take-aways we think are important in one place. This will inevitably be incomplete (or very long), but hopefully we’ll get something out in the not-distant future.
    Between the confusion and concern that has caused,
    I want to push back a bit against a norm I think you’re arguing for, along the lines of: we should impose much higher standards for sharing views that assert high p(doom), than for sharing views that assert low p(doom).
    High and low p(doom) are both just factual claims about the world; an ideal Bayesian reasoner wouldn’t treat them super differently, and by default would apply just as much scrutiny, skepticism, and wariness to someone who seems optimistic about AI outcomes, as to someone who seems pessimistic.
    In general, I want to be pretty cautious about proposed norms that might make people self-censor more if they have “concerning” views about object-level reality. There should be norms that hold here, but it’s not apparent to me that they should be stricter (or more strictly enforced) than for non-pessimistic posts.
    that right now MIRI’s strategy might be in a position of (temporary) incoherence was apparently plausible enough to a significant minority of readers.
    I still don’t know what incoherence you have in mind. Stuff like ‘Eliezer has a high p(doom)’ doesn’t strike me as good evidence for a ‘your strategy is incoherent’ hypothesis; high and low p(doom) are just different probabilities about the physical world.
    - Evan_Gaensbauer 8 Apr 2022 3:45 UTC
      2 points
      0
      Parent
      I don’t know what “this” is referring to in your sentence.
      I was referring to the fact that there are meta-jokes in the post about which parts are or are not jokes.
      I want to push back a bit against a norm I think you’re arguing for, along the lines of: we should impose much higher standards for sharing views that assert high p(doom), than for sharing views that assert low p(doom).
      I’m sorry I didn’t express myself more clearly. There shouldn’t be a higher standard for sharing views that assert a high(er) probability of doom. That’s not what I was arguing for. I’ve been under the impression Eliezer and maybe others have been sharing the view of a most extreme probability of doom, but without explaining their reasoning, or how their model changed from before. It’s the latter part that would be provoking confusion.
      I still don’t know what incoherence you have in mind. Stuff like ‘Eliezer has a high p(doom)’ doesn’t strike me as good evidence for a ‘your strategy is incoherent’ hypothesis; high and low p(doom) are just different probabilities about the physical world.
      The reasons for Eliezer or others at MIRI being more pessimistic than ever before seeming unclear, one possibility that came to mind is that there isn’t enough self-awareness of the model as to why, or that MIRI has for a few months had no idea what direction it’s going in now. That would lend itself to not having a coherent strategy at this time. Your reply has clarified though that it’s more like what MIRI’s strategic pivot will be is still in flux, or at least publicly communicating that well will take some more time, so I’m not thinking any of that now.
      I do appreciate the effort you, Eliezer and others at MIRI have put into what you’ve been publishing. I eagerly await a strategy update from MIRI.
      I’ll only mention one more thing that hasn’t bugged me as much but has bugged others in conversations I’ve participated in. The issue is that Eliezer appears to think, but without any follow-up, that most other approaches to AI alignment distinct from MIRI’s, including ones that otherwise draw inspiration from the rationality community, will also fail to bear fruit. Like, the takeaway isn’t other alignment researchers should just give up, or just come work for MIRI...?, but then what is it?
      A lack of answer to that question has left some people feel like they’ve been hung out to dry.
      - Rob Bensinger 10 Apr 2022 1:19 UTC
        12 points
        0
        Parent
        The issue is that Eliezer appears to think, but without any follow-up, that most other approaches to AI alignment distinct from MIRI’s, including ones that otherwise draw inspiration from the rationality community, will also fail to bear fruit. Like, the takeaway isn’t other alignment researchers should just give up, or just come work for MIRI...?, but then what is it?
        From the AGI interventions discussion we posted in November (note that “miracle” here means “surprising positive model violation”, not “positive event of negligible probability”):
        Anonymous
        At a high level one thing I want to ask about is research directions and prioritization. For example, if you were dictator for what researchers here (or within our influence) were working on, how would you reallocate them?
        Eliezer Yudkowsky
        The first reply that came to mind is “I don’t know.” I consider the present gameboard to look incredibly grim, and I don’t actually see a way out through hard work alone. We can hope there’s a miracle that violates some aspect of my background model, and we can try to prepare for that unknown miracle; preparing for an unknown miracle probably looks like “Trying to die with more dignity on the mainline” (because if you can die with more dignity on the mainline, you are better positioned to take advantage of a miracle if it occurs).
        [...]
        Eliezer Yudkowsky
        I have a few stupid ideas I could try to investigate in ML, but that would require the ability to run significant-sized closed ML projects full of trustworthy people, which is a capability that doesn’t seem to presently exist. Plausibly, this capability would be required in any world that got some positive model violation (“miracle”) to take advantage of, so I would want to build that capability today. I am not sure how to go about doing that either. [...] What I’d like to exist is a setup where I can work with people that I or somebody else has vetted as seeming okay-trustworthy, on ML projects that aren’t going to be published.
        [...]
        Anonymous
        How do you feel about the safety community as a whole and the growth we’ve seen over the past few years?
        Eliezer Yudkowsky
        Very grim. I think that almost everybody is bouncing off the real hard problems at the center and doing work that is predictably not going to be useful at the superintelligent level, nor does it teach me anything I could not have said in advance of the paper being written. People like to do projects that they know will succeed and will result in a publishable paper, and that rules out all real research at step 1 of the social process.
        Paul Christiano is trying to have real foundational ideas, and they’re all wrong, but he’s one of the few people trying to have foundational ideas at all; if we had another 10 of him, something might go right.
        Chris Olah is going to get far too little done far too late. We’re going to be facing down an unalignable AGI and the current state of transparency is going to be “well look at this interesting visualized pattern in the attention of the key-value matrices in layer 47” when what we need to know is “okay but was the AGI plotting to kill us or not”. But Chris Olah is still trying to do work that is on a pathway to anything important at all, which makes him exceptional in the field.
        The things I’d mainly recommend are interventions that:
        Help ourselves think more clearly. (I imagine this including a lot of trying-to-become-more-rational, developing and following relatively open/honest communication norms, and trying to build better mental models of crucial parts of the world.)
        Help relevant parts of humanity (e.g., the field of ML, or academic STEM) think more clearly and understand the situation.
        Help us understand and resolve major disagreements. (Especially current disagreements, but also future disagreements, if we can e.g. improve our ability to double-crux in some fashion.)
        Try to solve the alignment problem, especially via novel approaches.
        In particular: the biggest obstacle to alignment seems to be ‘current ML approaches are super black-box-y and produce models that are very hard to understand/interpret’; finding ways to better understand models produced by current techniques, or finding alternative techniques that yield more interpretable models, seems like where most of the action is.
        Think about the space of relatively-plausible “miracles”, think about future evidence that could make us quickly update toward a miracle-claim being true, and think about how we should act to take advantage of that miracle in that case.
        Build teams and skills that are well-positioned to take advantage of miracles when and if they arise. E.g., build some group like Redwood into an org that’s world-class in its ability to run ML experiments, so we have that capacity already available if we find a way to make major alignment progress in the future.
        This can also include indirect approaches, like ‘rather than try to solve the alignment problem myself, I’ll try to recruit physicists to work on it, because they might bring new and different perspectives to bear’.
        Though I definitely think there’s a lot to be said for more people trying to solve the alignment problem themselves, even if they’re initially pessimistic they’ll succeed!
        I think alignment is still the big blocker on good futures, and still the place where we’re most likely to see crucial positive surprises, if we see them anywhere—possibly Eliezer would disagree here.
        What links here?
        Rob Bensinger's comment on AGI Ruin: A List of Lethalities by Eliezer Yudkowsky (12 Jun 2022 3:29 UTC; 10 points)
        Evan_Gaensbauer 11 Apr 2022 1:21 UTC
        4 points
        0
        Parent
        Upvoted. Thanks.
        I’ll state that in my opinion it shouldn’t necessarily have to be the responsibility of MIRI or even Eliezer to clarify what was meant by a position stated but is taken out of context. I’m not sure but it seems as though at least a significant minority of those who’ve been alarmed by some of Eliezer’s statements haven’t read the full post to put it in a less dramatic context.
        Yet errant signals sent seem important to rectify as they make it harder for MIRI to coordinate with other actors in the field of AI alignment based on existing misconceptions.
        My impression is that misunderstanding about all of this is widespread in that there are at least a few people across every part of the field who don’t understand what MIRI is about these days at all. I don’t know how widespread it is in terms of how significant a portion of other actors in the field are generally confused with MIRI.