Rob Bensinger comments on MIRI announces new “Death With Dignity” strategy

Rob Bensinger 8 Apr 2022 2:12 UTC
5 points
It shouldn’t be surprising others are confused if this is your best guess about what the post means altogether.
I don’t know what you mean by this—I don’t know what “this” is referring to in your sentence.
As far as I’m aware, the reasoning motivating the kind of sentiments Eliezer expressed weren’t much explained elsewhere.
I mean, the big dump of chat logs is trying to make our background models clearer to people so that we can hopefully converge more. There’s an inherent tension between ‘say more stuff, in the hope that it clarifies something’ versus ‘say less stuff, so that it’s an easier read’. Currently I think the best strategy is to err on the side of over-sharing and posting long things, and then rely on follow-up discussion, summaries, etc. to address the fact that not everyone has time to read everything.
E.g., the three points you highlighted don’t seem like new information to me; I think we’ve said similar things publicly multiple times. But they can be new info to you, since you haven’t necessarily read the same resources I have.
Since everyone has read different things and will have different questions, I think the best solution is for you to just ask about the stuff that strikes you as the biggest holes in your MIRI-map.
I do think we’re overdue for a MIRI strategy post that collects a bunch of the take-aways we think are important in one place. This will inevitably be incomplete (or very long), but hopefully we’ll get something out in the not-distant future.
Between the confusion and concern that has caused,
I want to push back a bit against a norm I think you’re arguing for, along the lines of: we should impose much higher standards for sharing views that assert high p(doom), than for sharing views that assert low p(doom).
High and low p(doom) are both just factual claims about the world; an ideal Bayesian reasoner wouldn’t treat them super differently, and by default would apply just as much scrutiny, skepticism, and wariness to someone who seems optimistic about AI outcomes, as to someone who seems pessimistic.
In general, I want to be pretty cautious about proposed norms that might make people self-censor more if they have “concerning” views about object-level reality. There should be norms that hold here, but it’s not apparent to me that they should be stricter (or more strictly enforced) than for non-pessimistic posts.
that right now MIRI’s strategy might be in a position of (temporary) incoherence was apparently plausible enough to a significant minority of readers.
I still don’t know what incoherence you have in mind. Stuff like ‘Eliezer has a high p(doom)’ doesn’t strike me as good evidence for a ‘your strategy is incoherent’ hypothesis; high and low p(doom) are just different probabilities about the physical world.
- Evan_Gaensbauer 8 Apr 2022 3:45 UTC
  2 points
  Parent
  I don’t know what “this” is referring to in your sentence.
  I was referring to the fact that there are meta-jokes in the post about which parts are or are not jokes.
  I want to push back a bit against a norm I think you’re arguing for, along the lines of: we should impose much higher standards for sharing views that assert high p(doom), than for sharing views that assert low p(doom).
  I’m sorry I didn’t express myself more clearly. There shouldn’t be a higher standard for sharing views that assert a high(er) probability of doom. That’s not what I was arguing for. I’ve been under the impression Eliezer and maybe others have been sharing the view of a most extreme probability of doom, but without explaining their reasoning, or how their model changed from before. It’s the latter part that would be provoking confusion.
  I still don’t know what incoherence you have in mind. Stuff like ‘Eliezer has a high p(doom)’ doesn’t strike me as good evidence for a ‘your strategy is incoherent’ hypothesis; high and low p(doom) are just different probabilities about the physical world.
  The reasons for Eliezer or others at MIRI being more pessimistic than ever before seeming unclear, one possibility that came to mind is that there isn’t enough self-awareness of the model as to why, or that MIRI has for a few months had no idea what direction it’s going in now. That would lend itself to not having a coherent strategy at this time. Your reply has clarified though that it’s more like what MIRI’s strategic pivot will be is still in flux, or at least publicly communicating that well will take some more time, so I’m not thinking any of that now.
  I do appreciate the effort you, Eliezer and others at MIRI have put into what you’ve been publishing. I eagerly await a strategy update from MIRI.
  I’ll only mention one more thing that hasn’t bugged me as much but has bugged others in conversations I’ve participated in. The issue is that Eliezer appears to think, but without any follow-up, that most other approaches to AI alignment distinct from MIRI’s, including ones that otherwise draw inspiration from the rationality community, will also fail to bear fruit. Like, the takeaway isn’t other alignment researchers should just give up, or just come work for MIRI...?, but then what is it?
  A lack of answer to that question has left some people feel like they’ve been hung out to dry.
  - Rob Bensinger 10 Apr 2022 1:19 UTC
    12 points
    Parent
    The issue is that Eliezer appears to think, but without any follow-up, that most other approaches to AI alignment distinct from MIRI’s, including ones that otherwise draw inspiration from the rationality community, will also fail to bear fruit. Like, the takeaway isn’t other alignment researchers should just give up, or just come work for MIRI...?, but then what is it?
    From the AGI interventions discussion we posted in November (note that “miracle” here means “surprising positive model violation”, not “positive event of negligible probability”):
    Anonymous
    At a high level one thing I want to ask about is research directions and prioritization. For example, if you were dictator for what researchers here (or within our influence) were working on, how would you reallocate them?
    Eliezer Yudkowsky
    The first reply that came to mind is “I don’t know.” I consider the present gameboard to look incredibly grim, and I don’t actually see a way out through hard work alone. We can hope there’s a miracle that violates some aspect of my background model, and we can try to prepare for that unknown miracle; preparing for an unknown miracle probably looks like “Trying to die with more dignity on the mainline” (because if you can die with more dignity on the mainline, you are better positioned to take advantage of a miracle if it occurs).
    [...]
    Eliezer Yudkowsky
    I have a few stupid ideas I could try to investigate in ML, but that would require the ability to run significant-sized closed ML projects full of trustworthy people, which is a capability that doesn’t seem to presently exist. Plausibly, this capability would be required in any world that got some positive model violation (“miracle”) to take advantage of, so I would want to build that capability today. I am not sure how to go about doing that either. [...] What I’d like to exist is a setup where I can work with people that I or somebody else has vetted as seeming okay-trustworthy, on ML projects that aren’t going to be published.
    [...]
    Anonymous
    How do you feel about the safety community as a whole and the growth we’ve seen over the past few years?
    Eliezer Yudkowsky
    Very grim. I think that almost everybody is bouncing off the real hard problems at the center and doing work that is predictably not going to be useful at the superintelligent level, nor does it teach me anything I could not have said in advance of the paper being written. People like to do projects that they know will succeed and will result in a publishable paper, and that rules out all real research at step 1 of the social process.
    Paul Christiano is trying to have real foundational ideas, and they’re all wrong, but he’s one of the few people trying to have foundational ideas at all; if we had another 10 of him, something might go right.
    Chris Olah is going to get far too little done far too late. We’re going to be facing down an unalignable AGI and the current state of transparency is going to be “well look at this interesting visualized pattern in the attention of the key-value matrices in layer 47” when what we need to know is “okay but was the AGI plotting to kill us or not”. But Chris Olah is still trying to do work that is on a pathway to anything important at all, which makes him exceptional in the field.
    The things I’d mainly recommend are interventions that:
    Help ourselves think more clearly. (I imagine this including a lot of trying-to-become-more-rational, developing and following relatively open/honest communication norms, and trying to build better mental models of crucial parts of the world.)
    Help relevant parts of humanity (e.g., the field of ML, or academic STEM) think more clearly and understand the situation.
    Help us understand and resolve major disagreements. (Especially current disagreements, but also future disagreements, if we can e.g. improve our ability to double-crux in some fashion.)
    Try to solve the alignment problem, especially via novel approaches.
    In particular: the biggest obstacle to alignment seems to be ‘current ML approaches are super black-box-y and produce models that are very hard to understand/interpret’; finding ways to better understand models produced by current techniques, or finding alternative techniques that yield more interpretable models, seems like where most of the action is.
    Think about the space of relatively-plausible “miracles”, think about future evidence that could make us quickly update toward a miracle-claim being true, and think about how we should act to take advantage of that miracle in that case.
    Build teams and skills that are well-positioned to take advantage of miracles when and if they arise. E.g., build some group like Redwood into an org that’s world-class in its ability to run ML experiments, so we have that capacity already available if we find a way to make major alignment progress in the future.
    This can also include indirect approaches, like ‘rather than try to solve the alignment problem myself, I’ll try to recruit physicists to work on it, because they might bring new and different perspectives to bear’.
    Though I definitely think there’s a lot to be said for more people trying to solve the alignment problem themselves, even if they’re initially pessimistic they’ll succeed!
    I think alignment is still the big blocker on good futures, and still the place where we’re most likely to see crucial positive surprises, if we see them anywhere—possibly Eliezer would disagree here.
    What links here?
    Rob Bensinger's comment on AGI Ruin: A List of Lethalities by Eliezer Yudkowsky (12 Jun 2022 3:29 UTC; 10 points)
    - Evan_Gaensbauer 11 Apr 2022 1:21 UTC
      4 points
      Parent
      Upvoted. Thanks.
      I’ll state that in my opinion it shouldn’t necessarily have to be the responsibility of MIRI or even Eliezer to clarify what was meant by a position stated but is taken out of context. I’m not sure but it seems as though at least a significant minority of those who’ve been alarmed by some of Eliezer’s statements haven’t read the full post to put it in a less dramatic context.
      Yet errant signals sent seem important to rectify as they make it harder for MIRI to coordinate with other actors in the field of AI alignment based on existing misconceptions.
      My impression is that misunderstanding about all of this is widespread in that there are at least a few people across every part of the field who don’t understand what MIRI is about these days at all. I don’t know how widespread it is in terms of how significant a portion of other actors in the field are generally confused with MIRI.