Evan R. Murphy comments on Re-Define Intent Alignment?

Evan R. Murphy 9 Apr 2022 0:20 UTC
3 points
AF
but I think it should also not be limited strictly to mesa-optimizers, which neither Rohin nor I expect to appear in practice. (Mesa-optimizers appear to me to be the formalization of the idea “what if ML systems, which by default are not well-described as EU maximizers, learned to be EU maximizers?” I suspect MIRI people have some unshared intuitions about why we might expect this, but I currently don’t have a good reason to believe this.)
I was surprised to see you saying that Rohin (and yourself) don’t expect mesa-optimizers to appear in practice. As I recently read this from a comment of his on Alex Flint’s “The ground for optimization” which seems to state pretty clearly that he does expect mesa-optimization from AGI development:
Deep learning AGI implies mesa optimization: Since deep learning is so sample inefficient, it cannot reach human levels of performance if we apply deep learning directly to each possible task T. (For example, it has to relearn how the world works separately for each task T.) As a result, if we do get AGI primarily via deep learning, it must be that we used deep learning to create a new optimizing AI system, and that system was the AGI.
Argument for mesa optimization: Due to the complexity and noise in the real world, most economically useful tasks require setting up a robust optimizing system, rather than directly creating the target configuration state. (See also the importance of feedback for more on this intuition.) It seems likely that humans will find it easier to create algorithms that then find AGIs that can create these robust optimizing systems, rather than creating an algorithm that is directly an AGI.
(The previous argument also applies: this is basically just a generalization of the previous point to arbitrary AI systems, instead of only deep learning.)
I want to note that under this approach the notion of “search” and “mesa objective” are less natural, which I see as a pro of this approach (see also here): the argument is that we’ll get a general inner optimizing AI, but it doesn’t say much about what task that AI will be optimizing for (and it could be an optimizing AI that is retargetable by human instructions).
But that comment was from 2 years ago, whereas yours is less than a year old. So perhaps he changed views in the meantime? I’d be curious to hear/read more about why either of you don’t expect mesa-optimizers to appear in practice.