I don’t know if they’d put it like this, but IMO solving/understanding superposition is an important part of being able to really grapple with circuits in language models, and this is why it’s a focus of the Anthropic interp team
At least based on my convos with them, the Anthropic team does seem like a clear example of this, at least insofar as you think understanding circuits in real models with more than one MLP layer in them is important for interp—superposition just stops you from using the standard features as directions approach almost entirely!
I don’t know if they’d put it like this, but IMO solving/understanding superposition is an important part of being able to really grapple with circuits in language models, and this is why it’s a focus of the Anthropic interp team
At least based on my convos with them, the Anthropic team does seem like a clear example of this, at least insofar as you think understanding circuits in real models with more than one MLP layer in them is important for interp—superposition just stops you from using the standard features as directions approach almost entirely!