I’m not sure the tuned lens indicates that the model is doing iterative prediction; it shows that if for each layer in the model you train a linear classifier to predict the next token embedding from the activations, as you progress through the model the linear classifiers get more and more accurate. But that’s what we’d expect from any model, regardless of whether it was doing iterative prediction; each layer uses the features from the previous layer to calculate features that are more useful in the next layer. The inception network analysed in the distill.ai circuits thread starts by computing lines and gradients, then curves, then circles, then eyes, then faces, etc. Predicting the class from the presence of faces will be easier than from the presence of lines and gradients, so if you trained a tuned lens on inception v1 it would have the same pattern—lenses from later layers would have lower perplexity. I think to really show iterative prediction, you would have to be able to use the same lens for every layer; that would show that there is some consistent representation of the prediction being updated with each layer.
Here’s the relevant figure from the tuned lens—the transfer penalties for using a lens from one layer on another layer are small but meaningfully non-zero, and tend to increase the further away the layers are in the model. That they are small is suggestive that GPT might be doing something like iterative prediction, but the evidence isn’t compelling enough for my taste.
Thanks for the insightful response! Agree it’s just suggestive for now. Though more then with image models (where I’d expect lenses to transfer really badly, but don’t know). Perhaps it being a residual network is the key thing, since effective path lengths are low most of the information is “carried along” unchanged, meaning the same probe continues working for other layers. Idk
I’m not sure the tuned lens indicates that the model is doing iterative prediction; it shows that if for each layer in the model you train a linear classifier to predict the next token embedding from the activations, as you progress through the model the linear classifiers get more and more accurate. But that’s what we’d expect from any model, regardless of whether it was doing iterative prediction; each layer uses the features from the previous layer to calculate features that are more useful in the next layer. The inception network analysed in the distill.ai circuits thread starts by computing lines and gradients, then curves, then circles, then eyes, then faces, etc. Predicting the class from the presence of faces will be easier than from the presence of lines and gradients, so if you trained a tuned lens on inception v1 it would have the same pattern—lenses from later layers would have lower perplexity. I think to really show iterative prediction, you would have to be able to use the same lens for every layer; that would show that there is some consistent representation of the prediction being updated with each layer.
Here’s the relevant figure from the tuned lens—the transfer penalties for using a lens from one layer on another layer are small but meaningfully non-zero, and tend to increase the further away the layers are in the model. That they are small is suggestive that GPT might be doing something like iterative prediction, but the evidence isn’t compelling enough for my taste.
Thanks for the insightful response! Agree it’s just suggestive for now. Though more then with image models (where I’d expect lenses to transfer really badly, but don’t know). Perhaps it being a residual network is the key thing, since effective path lengths are low most of the information is “carried along” unchanged, meaning the same probe continues working for other layers. Idk