adamShimi comments on Discussion with Eliezer Yudkowsky on AGI interventions

adamShimi 12 Nov 2021 16:24 UTC
LW: 7 AF: 5
AF
Thanks for the pushback!
You’ve had a few comments along these lines in this thread, and I think this is where you’re most severely failing to see the situation from Yudkowsky’s point of view.
From Yudkowsky’s view, explaining and justifying MIRI’s work (and the processes he uses to reach such judgements more generally) was the main point of the sequences. He has written more on the topic than anyone else in the world, by a wide margin. He basically spent several years full-time just trying to get everyone up to speed, because the inductive gap was very very wide.
My memory of the sequences is that it’s far more about defending and explaining the alignment problem than criticizing prosaic AGI (maybe because the term couldn’t have been used years before Paul coined it?). Could you give me the best pointers of prosaic Alignment criticism in the sequence? I(I’ve read the sequences, but I don’t remember every single post, and my impression for memory is what I’ve written above).
I feel also that there might be a discrepancy between who I think of when I think of prosaic alignment researchers and what the category means in general/to most people here? My category mostly includes AF posters, people from a bunch of places like EleutherAI/OpenAI/DeepMind/Anthropic/Redwood and people from CHAI and FHI. I expect most of these people to actually have read the sequences, and tried to understand MIRI’s perspective. Maybe someone could point out a list of other places where prosaic alignment research is being done that I’m missing, especially places where people probably haven’t read the sequences? Or maybe I’m over estimating how many of the people in the places I mentioned have read the sequences?
- johnswentworth 12 Nov 2021 16:51 UTC
  LW: 53 AF: 26
  AF Parent
  I don’t mean to say that there’s critique of prosaic alignment specifically in the sequences. Rather, a lot of the generators of the Yudkowsky-esque worldview are in there. (That is how the sequences work: it’s not about arguing specific ideas around alignment, it’s about explaining enough of the background frames and generators that the argument becomes unnecessary. “Raise the sanity waterline” and all that.)
  For instance, just the other day I ran across this:
  Of this I learn the lesson: You cannot manipulate confusion. You cannot make clever plans to work around the holes in your understanding. You can’t even make “best guesses” about things which fundamentally confuse you, and relate them to other confusing things. Well, you can, but you won’t get it right, until your confusion dissolves. Confusion exists in the mind, not in the reality, and trying to treat it like something you can pick up and move around, will only result in unintentional comedy.
  Similarly, you cannot come up with clever reasons why the gaps in your model don’t matter. You cannot draw a border around the mystery, put on neat handles that let you use the Mysterious Thing without really understanding it—like my attempt to make the possibility that life is meaningless cancel out of an expected utility formula. You can’t pick up the gap and manipulate it.
  If the blank spot on your map conceals a land mine, then putting your weight down on that spot will be fatal, no matter how good your excuse for not knowing. Any black box could contain a trap, and there’s no way to know except opening up the black box and looking inside. If you come up with some righteous justification for why you need to rush on ahead with the best understanding you have—the trap goes off.
  (The earlier part of the post had a couple embarrassing stories of mistakes Yudkowsky made earlier, which is where the lesson came from.) Reading that, I was like, “man that sure does sound like the Yudkowsky-esque viewpoint on prosaic alignment”.
  Or maybe I’m over estimating how many of the people in the places I mentioned have read the sequences?
  I think you are overestimating. At the orgs you list, I’d guess at least 25% and probably more than half have not read the sequences. (Low confidence/wide error bars, though.)