With regard to the editing text discussion, I was thinking of a really simple approach where we resample words in the text at random. Perhaps that wouldn’t work great, but I do think editing has potential because it allows for more sophisticated thinking.
Let’s say we want our language model to design us an aircraft. Perhaps its starts by describing the engine, and then it describes the wings. Standard autoregressive text generation (assuming no lookahead) will allow the engine design to influence the wing design (assuming the engine design is inside the context window when it’s writing about the wings), but it won’t allow the wing design to influence the engine design. However, if the model is allowed to edit its text, it can rethink the engine in light of the wings and rethink the wings in light of the engine until it’s designed a really good aircraft.
In particular, it would be good to figure out some way of contriving a mesa-optimization setup, such that we could measure if these fixes would prevent it or not.
Agreed. Perhaps if we generated lots of travelling salesman problem instances where the greedy approach doesn’t get you something that looks like the optimal route, then try & train a GPT architecture to predict the cities in the optimal route in order?
This is an interesting quote:
...in our experience we find that lean stochastic local search techniques such as simulated annealing are often the most competitive for hard problems with little structure to exploit.
I suspect GPT will be biased towards avoiding mesa-optimization and making use of heuristics, so the best contrived mesa-optimization setup may be an optimization problem with little structure where heuristics aren’t very helpful. Maybe we could focus on problems where non-heuristic methods such as branch and bound / backtracking are considered state of the art, and train the architecture to mesa-optimize by starting with easy instances and gradually moving to harder and harder ones.
With regard to the editing text discussion, I was thinking of a really simple approach where we resample words in the text at random. Perhaps that wouldn’t work great, but I do think editing has potential because it allows for more sophisticated thinking.
Let’s say we want our language model to design us an aircraft. Perhaps its starts by describing the engine, and then it describes the wings. Standard autoregressive text generation (assuming no lookahead) will allow the engine design to influence the wing design (assuming the engine design is inside the context window when it’s writing about the wings), but it won’t allow the wing design to influence the engine design. However, if the model is allowed to edit its text, it can rethink the engine in light of the wings and rethink the wings in light of the engine until it’s designed a really good aircraft.
Agreed. Perhaps if we generated lots of travelling salesman problem instances where the greedy approach doesn’t get you something that looks like the optimal route, then try & train a GPT architecture to predict the cities in the optimal route in order?
This is an interesting quote:
Source.
I suspect GPT will be biased towards avoiding mesa-optimization and making use of heuristics, so the best contrived mesa-optimization setup may be an optimization problem with little structure where heuristics aren’t very helpful. Maybe we could focus on problems where non-heuristic methods such as branch and bound / backtracking are considered state of the art, and train the architecture to mesa-optimize by starting with easy instances and gradually moving to harder and harder ones.
Clarifying Q: Does mesa-optimization refer to any inner optimizer, or one that is in particular not aligned with the outer context?
I was using it to refer to “any inner optimizer”. I think that’s the standard usage but I’m not completely sure.