I talk about this a bit here, but basically if you train huge models for a short period of time, you’re really relying on your inductive biases to find the simplest model that fits the data—and mesa-optimizers, especially deceptive mesa-optimizers, are quite simple, compressed policies.
I talk about this a bit here, but basically if you train huge models for a short period of time, you’re really relying on your inductive biases to find the simplest model that fits the data—and mesa-optimizers, especially deceptive mesa-optimizers, are quite simple, compressed policies.