Also, what vectors are you using? is this the final output layer?
I suggest trying the vectors in the encoder layers 0-48 in GPT2-xl. I am getting the impression that the visualization of those layers are more of a submerged iceberg rather than a helix...
Also, what vectors are you using? is this the final output layer?
I suggest trying the vectors in the encoder layers 0-48 in GPT2-xl. I am getting the impression that the visualization of those layers are more of a submerged iceberg rather than a helix...
Nope, this is the pos_embed matrix! So before the first layer.
I see. I’ll try this thanks!