I am aware of the definition of a mesa-optimizer. I’m just saying that this definition doesn’t catch much of the danger or exciting stuff, since it’s e.g. not so important if the model contains something that optimizes over e.g. L(x)=(x2−2)2. The danger comes when the inner objective is more consequentialist than that.
I also think that it’s pretty bad to claim that something is only an optimizer if it’s a power-seeking consequentialist agent. For example, this would imply that the outer loop that produces neural network policies (by gradient descent on network parameters) is not an optimizer!
If the outer loop isn’t connected to some mechanism that evaluates the consequences of different policies in the real world, then you are probably training something that mimics the prespecified training data rather than searching for novel powerful policies. This isn’t useless—the explosive growth of statistical models for various purposes proves so much—but it is unlikely to be dangerous unless coupled with some other process that handles the optimization.
So I’m not saying these things aren’t optimizers, but much of the worry about mesa-optimizers is about consequentialist optimizers, not generic optimizers. Most functions are perfectly safe to optimize.
I am aware of the definition of a mesa-optimizer. I’m just saying that this definition doesn’t catch much of the danger or exciting stuff, since it’s e.g. not so important if the model contains something that optimizes over e.g. L(x)=(x2−2)2. The danger comes when the inner objective is more consequentialist than that.
If the outer loop isn’t connected to some mechanism that evaluates the consequences of different policies in the real world, then you are probably training something that mimics the prespecified training data rather than searching for novel powerful policies. This isn’t useless—the explosive growth of statistical models for various purposes proves so much—but it is unlikely to be dangerous unless coupled with some other process that handles the optimization.
So I’m not saying these things aren’t optimizers, but much of the worry about mesa-optimizers is about consequentialist optimizers, not generic optimizers. Most functions are perfectly safe to optimize.