The whole system is an Optimizing AI, according to the definition given above, but neither of the two parts is by itself
Yeah, I’m talking about the whole system.
it doesn’t seem to have the flavor of mesa-optimization
Yeah, I agree it doesn’t fit the explanation / definition in Risks from Learned Optimization. I don’t like that definition, and usually mean something like “running the model parameters instantiates a computation that does ‘reasoning’”, which I think does fit this example. I mentioned this a bit later in the comment:
I want to note that under this approach the notion of “search” and “mesa objective” are less natural, which I see as a pro of this approach [...]: the argument is that we’ll get a general inner optimizing AI, but it doesn’t say much about what task that AI will be optimizing for (and it could be an optimizing AI that is retargetable by human instructions).
Yeah, I’m talking about the whole system.
Yeah, I agree it doesn’t fit the explanation / definition in Risks from Learned Optimization. I don’t like that definition, and usually mean something like “running the model parameters instantiates a computation that does ‘reasoning’”, which I think does fit this example. I mentioned this a bit later in the comment: