The dominant framework that I expect people to have which disagree with distinction is simply that when optimizers become more powerful, there might be a smooth transition between an optimizer_1 and an optimizer_2. That is, if an optimizer is trained on some simulated environment, then from our point of view it may well look like it is performing a local constrained search for policies within its training environment. However, when the optimizer is taken off the distribution, then it may act more like an optimizer_2.
One particular example would be if we were dumping so much compute into selecting for mesa optimizers that they became powerful enough to understand external reality. On the training distribution they would do well, but off it they would just aim for whatever their mesa objective was. In this case it might look more like it was just an optimizer_2 all along and we were simply mistaken about its search capabilities, but on the other hand, the task we gave it was limited enough that we initially thought it would only run optimizer_1 searches.
That said, I agree that it is difficult to see how such a transition from optimizer_1 to optimization_2 could occur in the real world.
I should clarify that I’m not necessarily saying that there can’t be cases in which a system that is believed or intended to be an optimizer_1 might become or turn out to be an optimizer_2 – I have not really argued for or against this. What I want to do is enable clearer thinking about issue, so that one does not slide between these two concepts without noticing.
The dominant framework that I expect people to have which disagree with distinction is simply that when optimizers become more powerful, there might be a smooth transition between an optimizer_1 and an optimizer_2. That is, if an optimizer is trained on some simulated environment, then from our point of view it may well look like it is performing a local constrained search for policies within its training environment. However, when the optimizer is taken off the distribution, then it may act more like an optimizer_2.
One particular example would be if we were dumping so much compute into selecting for mesa optimizers that they became powerful enough to understand external reality. On the training distribution they would do well, but off it they would just aim for whatever their mesa objective was. In this case it might look more like it was just an optimizer_2 all along and we were simply mistaken about its search capabilities, but on the other hand, the task we gave it was limited enough that we initially thought it would only run optimizer_1 searches.
That said, I agree that it is difficult to see how such a transition from optimizer_1 to optimization_2 could occur in the real world.
I should clarify that I’m not necessarily saying that there can’t be cases in which a system that is believed or intended to be an optimizer_1 might become or turn out to be an optimizer_2 – I have not really argued for or against this. What I want to do is enable clearer thinking about issue, so that one does not slide between these two concepts without noticing.