sunwillrise comments on Seth Herd’s Shortform

sunwillrise 3 Jun 2024 19:59 UTC
8 points
0
I have found MIRI’s strategy baffling in the past. I think I’m understanding it better after spending some time going deep on their AI risk arguments. I wish they’d spend more effort communicating with the rest of the alignment community, but I’m also happy to try to do that communication. I certainly don’t speak for MIRI.
On the surface, their strategy seems absurd. They think doom is ~99% likely, so they’re going to try to shut it all down—stop AGI research entirely. They know that this probably won’t work; it’s just the least-doomed strategy in their world model. It’s playing to the outs, or dying with dignity.
The weird thing here is that their >90% doom disagrees with almost everyone else who thinks seriously about AGI risk. You can dismiss a lot of people as not having grappled with the most serious arguments for alignment difficulty, but relative long-timers like Rohin Shah and Paul Christiano definitely have. People of that nature tend to have higher p(doom) estimates than optimists who are newer to the game and think more about current deep nets, but much lower than MIRI leadership.
Yes, I agree that this should strike an outside observer as weird the first time they notice it. I think you have done a pretty good job of keying in on important cruxes between people who are far on the doomer side and people who are still worried but not nearly to that extent.
That being said, there is one other specific point that I think is important to see fully spelled out. You kind of gestured at it with regards to corrigibility when you referenced my post about coherence theorems, but you didn’t key in on it in detail. More explicitly, what I am referring to (piggybacking off of another comment I left on that post) is that Eliezer and MIRI-aligned people believe in a very specific set of conclusions about what AGI cognition must be like (and their concerns about corrigibility, for instance, are logically downstream of their strong belief in this sort-of realism about rationality):
Eliezer is essentially claiming that, just as his pessimism compared to other AI safety researchers is due to him having engaged with the relevant concepts at a concrete level (“So I have a general thesis about a failure mode here which is that, the moment you try to sketch any concrete plan or events which correspond to the abstract descriptions, it is much more obviously wrong, and that is why the descriptions stay so abstract in the mouths of everybody who sounds more optimistic than I am. This may, perhaps, be confounded by the phenomenon where I am one of the last living descendants of the lineage that ever knew how to say anything concrete at all”), his experience with and analysis of powerful optimization allows him to be confident in what the cognition of a powerful AI would be like. In this view, Vingean uncertainty prevents us from knowing what specific actions the superintelligence would take, but effective cognition runs on Laws that can nonetheless be understood and which allow us to grasp the general patterns (such as Instrumental Convergence) of even an “alien mind” that’s sufficiently powerful. In particular, any (or virtually any) sufficiently advanced AI must be a consequentialist optimizer that is an agent as opposed to a tool and which acts to maximize expected utility [over future world states] according to its world model to purse a goal that can be extremely different from what humans deem good.
Here is the important insight, at least from my perspective: while I would expect a lot of (or maybe even a majority) of AI alignment researchers to agree (meaning, to believe with >80% probability) with some or most of those claims, I think the way MIRI people get to their very confident belief in doom is that they believe all of those claims are true (with essentially >95% probability). Eliezer is a law-thinker above all else when it comes to powerful optimization and cognition; he has been ever since the early Sequences 17 years ago, and he seems (in my view excessively and misleadingly) confident that he truly gets how strong optimizers have to function.