I’m a little confused as to why there’s any question here. Every algorithm lies on a spectrum of tradeoffs from general to narrow. The narrower a class of solved problems, the more efficient (in any way you care to name) an algorithm can be: a Tic-Tac-Toe solver is going to be a lot more efficient than AIXI.
Meta-learning works because the inner algorithm can be far more specialized, and thus, more performant or sample-efficient than the highly general outer algorithm which learned the inner algorithm.
For example, in Dactyl, PPO trains a RNN to adapt to many possible robot hands on the fly in as few samples as possible; it’s probably several orders of magnitude faster than online training of an RNN by PPO directly. “Why not just use that RNN for DoTA2, if it’s so much better than PPO?” Well, because DoTA2 has little or nothing to do with robotic hands rotating cubes, an algorithm that excels at robot hand will not transfer to DoTA2. PPO will still work, though.
Here on LW / AF, “mesa optimization” seems to only apply if there’s some sort of “general” learning algorithm, especially one that is “using search”, for reasons that have always been unclear to me. Some relevant posts taking the opposite perspective (which I endorse):
I’m a little confused as to why there’s any question here. Every algorithm lies on a spectrum of tradeoffs from general to narrow. The narrower a class of solved problems, the more efficient (in any way you care to name) an algorithm can be: a Tic-Tac-Toe solver is going to be a lot more efficient than AIXI.
Meta-learning works because the inner algorithm can be far more specialized, and thus, more performant or sample-efficient than the highly general outer algorithm which learned the inner algorithm.
For example, in Dactyl, PPO trains a RNN to adapt to many possible robot hands on the fly in as few samples as possible; it’s probably several orders of magnitude faster than online training of an RNN by PPO directly. “Why not just use that RNN for DoTA2, if it’s so much better than PPO?” Well, because DoTA2 has little or nothing to do with robotic hands rotating cubes, an algorithm that excels at robot hand will not transfer to DoTA2. PPO will still work, though.
Here on LW / AF, “mesa optimization” seems to only apply if there’s some sort of “general” learning algorithm, especially one that is “using search”, for reasons that have always been unclear to me. Some relevant posts taking the opposite perspective (which I endorse):
Is the term mesa optimizer too narrow?
Why is pseudo-alignment “worse” than other ways ML can fail to generalize?