It is in fact concentrated away from that, as you predicted! Here’s a cool scatter plot:
The blue points are the positional embeddings for gpt2-small, whereas the red points are the token embeddings.
If you want to play around with it yourself, you can find it in the experiments/ directory in the following github: https://github.com/adamyedidia/resid_viewer.
You can skip most of the setup in the README if you just want to reproduce the experiment (there’s a lot of other stuff going on the repository, but you’ll still need to install TransformerLens, sklearn, numpy, etc.
It is in fact concentrated away from that, as you predicted! Here’s a cool scatter plot:
The blue points are the positional embeddings for gpt2-small, whereas the red points are the token embeddings.
If you want to play around with it yourself, you can find it in the experiments/ directory in the following github: https://github.com/adamyedidia/resid_viewer.
You can skip most of the setup in the README if you just want to reproduce the experiment (there’s a lot of other stuff going on the repository, but you’ll still need to install TransformerLens, sklearn, numpy, etc.