Regarding Ortega et al., I agree that the proof presented in the paper is just about how a single generator can be equivalent to sequences of multiple generators. The point that the authors are using that proof to make, however, is somewhat more broad, which is that your model can learn a learning algorithm even when the task you give it isn’t explicitly a meta-learning task. Since a learning algorithm is a type of search/optimization algorithm, however, if you recast that conclusion into the language of Risks from Learned Optimization, you get exactly the concern regarding mesa-optimization, which is that models can learn optimization algorithms even when you don’t intend them to.
Regarding Ortega et al., I agree that the proof presented in the paper is just about how a single generator can be equivalent to sequences of multiple generators. The point that the authors are using that proof to make, however, is somewhat more broad, which is that your model can learn a learning algorithm even when the task you give it isn’t explicitly a meta-learning task. Since a learning algorithm is a type of search/optimization algorithm, however, if you recast that conclusion into the language of Risks from Learned Optimization, you get exactly the concern regarding mesa-optimization, which is that models can learn optimization algorithms even when you don’t intend them to.