I have found MIRI’s strategy baffling in the past. I think I’m understanding it better after spending some time going deep on their AI risk arguments. I wish they’d spend more effort communicating with the rest of the alignment community, but I’m also happy to try to do that communication. I certainly don’t speak for MIRI.
On the surface, their strategy seems absurd. They think doom is ~99% likely, so they’re going to try to shut it all down—stop AGI research entirely. They know that this probably won’t work; it’s just the least-doomed strategy in their world model. It’s playing to the outs, or dying with dignity.
The weird thing here is that their >90% doom disagrees with almost everyone else who thinks seriously about AGI risk. You can dismiss a lot of people as not having grappled with the most serious arguments for alignment difficulty, but relative long-timers like Rohin Shah and Paul Christiano definitely have. People of that nature tend to have higher p(doom) estimates than optimists who are newer to the game and think more about current deep nets, but much lower than MIRI leadership.
Yes, I agree that this should strike an outside observer as weird the first time they notice it. I think you have done a pretty good job of keying in on important cruxes between people who are far on the doomer side and people who are still worried but not nearly to that extent.
That being said, there is one other specific point that I think is important to see fully spelled out. You kind of gestured at it with regards to corrigibility when you referenced my post about coherence theorems, but you didn’t key in on it in detail. More explicitly, what I am referring to (piggybacking off of another comment I left on that post) is that Eliezer and MIRI-aligned people believe in a very specific setof conclusions about what AGI cognition must be like (and their concerns about corrigibility, for instance, are logically downstream of their strong belief in this sort-of realism about rationality):
Here is the important insight, at least from my perspective: while I would expect a lot of (or maybe even a majority) of AI alignment researchers to agree (meaning, to believe with >80% probability) with some or most of those claims, I think the way MIRI people get to their very confident belief in doom is that they believe all of those claims are true (with essentially >95% probability). Eliezer is a law-thinker above all else when it comes to powerful optimization and cognition; he has been ever since the early Sequences 17 years ago, and he seems (in my view excessively and misleadingly) confident that he truly getshow strong optimizers have to function.
Yes, I agree that this should strike an outside observer as weird the first time they notice it. I think you have done a pretty good job of keying in on important cruxes between people who are far on the doomer side and people who are still worried but not nearly to that extent.
That being said, there is one other specific point that I think is important to see fully spelled out. You kind of gestured at it with regards to corrigibility when you referenced my post about coherence theorems, but you didn’t key in on it in detail. More explicitly, what I am referring to (piggybacking off of another comment I left on that post) is that Eliezer and MIRI-aligned people believe in a very specific set of conclusions about what AGI cognition must be like (and their concerns about corrigibility, for instance, are logically downstream of their strong belief in this sort-of realism about rationality):
Here is the important insight, at least from my perspective: while I would expect a lot of (or maybe even a majority) of AI alignment researchers to agree (meaning, to believe with >80% probability) with some or most of those claims, I think the way MIRI people get to their very confident belief in doom is that they believe all of those claims are true (with essentially >95% probability). Eliezer is a law-thinker above all else when it comes to powerful optimization and cognition; he has been ever since the early Sequences 17 years ago, and he seems (in my view excessively and misleadingly) confident that he truly gets how strong optimizers have to function.