Here’s the plots you asked for for all heads! You can find them at:
https://github.com/adamyedidia/resid_viewer/tree/main/experiments/pngs
Haven’t looked too carefully yet but it looks like it makes little difference for most heads, but is important for L0H4 and L0H7.
Thank you! I’m still surprised how little most heads in L0 + L1 seem to be using the positional embeddings. L1H4 looks reasonably uniform so I could accept that maybe that somehow feeds into L2H2.
Here’s the plots you asked for for all heads! You can find them at:
https://github.com/adamyedidia/resid_viewer/tree/main/experiments/pngs
Haven’t looked too carefully yet but it looks like it makes little difference for most heads, but is important for L0H4 and L0H7.
Thank you! I’m still surprised how little most heads in L0 + L1 seem to be using the positional embeddings. L1H4 looks reasonably uniform so I could accept that maybe that somehow feeds into L2H2.