Deep learning AGI implies mesa optimization: Since deep learning is so sample inefficient, it cannot reach human levels of performance if we apply deep learning directly to each possible task T. (For example, it has to relearn how the world works separately for each task T.) As a result, if we do get AGI primarily via deep learning, it must be that we used deep learning to create a new optimizing AI system, and that system was the AGI.
I don’t quite understand what this is saying.
Suppose we train a giant deep learning model via self-supervised learning on a ton of real-world data (like GPT-N, but w/ other sensory modalities besides text), and then we build a second system designed to provide a nice interface to the giant model.
We’d give task specifications to the interface, and it would have some smarts about how to consult the model to figure out what to do. (The interface might also be learned, via reinforcement or supervised learning, or it might be hand-coded.)
It seems plausible to me that a system comprising these two pieces, the model and the interface, could be an AGI according to the definition here, in that when combined with a very wide variety of environments (including the task specification in the environment), it could perform at least as well as a human.
And since most of the smarts seem like they’d be in the model rather than the interface, I’d count it as getting AGI “primarily via deep learning”, even if the interface was hand-coded.
But it’s not clear to me whether that would count as using deep learning to “create a new optimizing AI system”, which is itself the AGI. The whole system is an Optimizing AI, according to the definition given above, but neither of the two parts is by itself, and it doesn’t seem to have the flavor of mesa-optimization, as I understand it. So it seems like a contradiction to the quoted claim.
Have I misunderstood what you’re saying here, or do you disagree with the characterization I gave of the hypothetical model + interface system? (Or have I perhaps misunderstood mesa-optimization?)
The whole system is an Optimizing AI, according to the definition given above, but neither of the two parts is by itself
Yeah, I’m talking about the whole system.
it doesn’t seem to have the flavor of mesa-optimization
Yeah, I agree it doesn’t fit the explanation / definition in Risks from Learned Optimization. I don’t like that definition, and usually mean something like “running the model parameters instantiates a computation that does ‘reasoning’”, which I think does fit this example. I mentioned this a bit later in the comment:
I want to note that under this approach the notion of “search” and “mesa objective” are less natural, which I see as a pro of this approach [...]: the argument is that we’ll get a general inner optimizing AI, but it doesn’t say much about what task that AI will be optimizing for (and it could be an optimizing AI that is retargetable by human instructions).
I don’t quite understand what this is saying.
Suppose we train a giant deep learning model via self-supervised learning on a ton of real-world data (like GPT-N, but w/ other sensory modalities besides text), and then we build a second system designed to provide a nice interface to the giant model.
We’d give task specifications to the interface, and it would have some smarts about how to consult the model to figure out what to do. (The interface might also be learned, via reinforcement or supervised learning, or it might be hand-coded.)
It seems plausible to me that a system comprising these two pieces, the model and the interface, could be an AGI according to the definition here, in that when combined with a very wide variety of environments (including the task specification in the environment), it could perform at least as well as a human.
And since most of the smarts seem like they’d be in the model rather than the interface, I’d count it as getting AGI “primarily via deep learning”, even if the interface was hand-coded.
But it’s not clear to me whether that would count as using deep learning to “create a new optimizing AI system”, which is itself the AGI. The whole system is an Optimizing AI, according to the definition given above, but neither of the two parts is by itself, and it doesn’t seem to have the flavor of mesa-optimization, as I understand it. So it seems like a contradiction to the quoted claim.
Have I misunderstood what you’re saying here, or do you disagree with the characterization I gave of the hypothetical model + interface system? (Or have I perhaps misunderstood mesa-optimization?)
Yeah, I’m talking about the whole system.
Yeah, I agree it doesn’t fit the explanation / definition in Risks from Learned Optimization. I don’t like that definition, and usually mean something like “running the model parameters instantiates a computation that does ‘reasoning’”, which I think does fit this example. I mentioned this a bit later in the comment: