Note that there’s a risk of mesa-optimization developing if lookahead improves performance at any point during GPT’s training.)
Why?
My thought was that if lookahead improves performance during some period of the training, it’s liable to develop mesa-optimization during that period, and then find it to be a useful for other things later on.
Why?
My thought was that if lookahead improves performance during some period of the training, it’s liable to develop mesa-optimization during that period, and then find it to be a useful for other things later on.