A lot of the discussion of mesa-optimization seems confused.
One thing that might be relevant towards clearing up the confusion is just to remember that “learning” and “inference” should not be thought of as cleanly separated, in the first place, see, e.g. AIXI…
So when we ask “is it learning? Or just solving the task without learning”, this seems like a confused framing to me. Suppose your ML system learned an excellent prior, and then just did Bayesian inference at test time. Is that learning? Sure, why not. It might not use a traditional search/optimization algorithm, but probably is has to do *something* like that for computational reasons if it wants to do efficient approximate Bayesian inference over a large hypothesis space, so...
A lot of the discussion of mesa-optimization seems confused.
One thing that might be relevant towards clearing up the confusion is just to remember that “learning” and “inference” should not be thought of as cleanly separated, in the first place, see, e.g. AIXI…
So when we ask “is it learning? Or just solving the task without learning”, this seems like a confused framing to me. Suppose your ML system learned an excellent prior, and then just did Bayesian inference at test time. Is that learning? Sure, why not. It might not use a traditional search/optimization algorithm, but probably is has to do *something* like that for computational reasons if it wants to do efficient approximate Bayesian inference over a large hypothesis space, so...