Megan Kinniment comments on Steering Behaviour: Testing for (Non-)Myopia in Language Models

Megan Kinniment 7 Dec 2022 7:26 UTC
10 points
1
As I just finished explaining, the claim of myopia is that the model optimized for next-token prediction is only modeling the next-token, and nothing else, because “it is just trained to predict the next token conditional on its input”. The claim of non-myopia is that a model will be modeling additional future tokens in addition to the next token, a capability induced by attempting to model the next token better.
These definitions are not equivalent to the ones we gave (and as far as I’m aware the definitions we use are much closer to commonly used definitions of myopia and non-myopia than the ones you give here).
Arthur is also entirely correct that your examples are not evidence of non-myopia by the definitions we use.

The definition of myopia that we use is that the model minimises loss on the next token and the next token alone, this is not the same as requiring that the model only ‘models’ / ‘considers information only directly relevant to’ the next token and the next token alone.

A model exhibiting myopic behaviour can still be great at the kinds of tasks you describe as requiring ‘modelling of future tokens’. The claim that some model was displaying myopic behaviour here would simply be that all of this ‘future modelling’ (or any other internal processing) is done entirely in service of minimising loss on just the next token. This is in contrast to the kinds of non-myopic models we are considering in this post—where the minimisation of loss over a multi-token completion encourages sacrificing some loss when generating early tokens in certain situations.