I’m not explaining it correctly or maybe I’m misinterpreting what you’re saying or maybe I’m just wrong; the problem is you can get a mesa-optimizer even when you’re training on the real objective, if the result of your ML process is itself an optimizer that performs well in the test cases as far as you run it/simulate it, but then in the limit of resources/compute pursues something different.
I’m not explaining it correctly or maybe I’m misinterpreting what you’re saying or maybe I’m just wrong; the problem is you can get a mesa-optimizer even when you’re training on the real objective, if the result of your ML process is itself an optimizer that performs well in the test cases as far as you run it/simulate it, but then in the limit of resources/compute pursues something different.