johnswentworth comments on Discussion with Eliezer Yudkowsky on AGI interventions

johnswentworth 12 Nov 2021 16:51 UTC
LW: 53 AF: 26
AF
I don’t mean to say that there’s critique of prosaic alignment specifically in the sequences. Rather, a lot of the generators of the Yudkowsky-esque worldview are in there. (That is how the sequences work: it’s not about arguing specific ideas around alignment, it’s about explaining enough of the background frames and generators that the argument becomes unnecessary. “Raise the sanity waterline” and all that.)
For instance, just the other day I ran across this:
Of this I learn the lesson: You cannot manipulate confusion. You cannot make clever plans to work around the holes in your understanding. You can’t even make “best guesses” about things which fundamentally confuse you, and relate them to other confusing things. Well, you can, but you won’t get it right, until your confusion dissolves. Confusion exists in the mind, not in the reality, and trying to treat it like something you can pick up and move around, will only result in unintentional comedy.
Similarly, you cannot come up with clever reasons why the gaps in your model don’t matter. You cannot draw a border around the mystery, put on neat handles that let you use the Mysterious Thing without really understanding it—like my attempt to make the possibility that life is meaningless cancel out of an expected utility formula. You can’t pick up the gap and manipulate it.
If the blank spot on your map conceals a land mine, then putting your weight down on that spot will be fatal, no matter how good your excuse for not knowing. Any black box could contain a trap, and there’s no way to know except opening up the black box and looking inside. If you come up with some righteous justification for why you need to rush on ahead with the best understanding you have—the trap goes off.
(The earlier part of the post had a couple embarrassing stories of mistakes Yudkowsky made earlier, which is where the lesson came from.) Reading that, I was like, “man that sure does sound like the Yudkowsky-esque viewpoint on prosaic alignment”.
Or maybe I’m over estimating how many of the people in the places I mentioned have read the sequences?
I think you are overestimating. At the orgs you list, I’d guess at least 25% and probably more than half have not read the sequences. (Low confidence/wide error bars, though.)