On page 8 of the paper they say, “our work does not demonstrate or address mesa-optimization”. I think it’s because none of the agents in their paper has learned an optimization process (i.e. is running something like a search algorithm on the inside).
FWIW I believe I wrote that sentence and I now think this is a matter of definition, and that it’s actually reasonable to think of an agent that e.g. reliably solves a maze as an optimizer even if it does not use explicit search internally.
I am really curious about the disagree votes here, do people think this is not an empirical work demonstrating mesa-optimization?
On page 8 of the paper they say, “our work does not demonstrate or address mesa-optimization”. I think it’s because none of the agents in their paper has learned an optimization process (i.e. is running something like a search algorithm on the inside).
FWIW I believe I wrote that sentence and I now think this is a matter of definition, and that it’s actually reasonable to think of an agent that e.g. reliably solves a maze as an optimizer even if it does not use explicit search internally.