Tom Lieberum comments on Gears-Level Mental Models of Transformer Interpretability

Tom Lieberum 31 Mar 2022 13:08 UTC
1 point
Small nitpick:

The PCA plot is using the smallest version of GPT2, and not the 1.5B parameter model (that would be GPT2-XL). The small model is significantly worse than the large one and so I would be hesitant to draw conclusions from that experiment alone.