LawrenceC comments on The Engineer’s Interpretability Sequence (EIS) I: Intro

LawrenceC 10 Feb 2023 22:46 UTC
LW: 4 AF: 2
0
AF
At least based on my convos with them, the Anthropic team does seem like a clear example of this, at least insofar as you think understanding circuits in real models with more than one MLP layer in them is important for interp—superposition just stops you from using the standard features as directions approach almost entirely!