scasper comments on The Engineer’s Interpretability Sequence (EIS) I: Intro

scasper 9 Feb 2023 19:35 UTC
LW: 1 AF: 1
0
AF
I think that (1) is interesting. This sounds plausible, but I do not know of any examples of this perspective being fleshed out. Do you know of any posts on this?
- Neel Nanda 10 Feb 2023 22:11 UTC
  LW: 4 AF: 2
  0
  AF Parent
  I don’t know if they’d put it like this, but IMO solving/understanding superposition is an important part of being able to really grapple with circuits in language models, and this is why it’s a focus of the Anthropic interp team
  - LawrenceC 10 Feb 2023 22:46 UTC
    LW: 4 AF: 2
    0
    AF Parent
    At least based on my convos with them, the Anthropic team does seem like a clear example of this, at least insofar as you think understanding circuits in real models with more than one MLP layer in them is important for interp—superposition just stops you from using the standard features as directions approach almost entirely!
- ryan_greenblatt 9 Feb 2023 19:48 UTC
  LW: 4 AF: 3
  4
  AF Parent
  I would argue that ARC’s research is justified by (1) (roughly speaking). Sadly, I don’t think that there are enough posts on their current plans for this to be clear or easy for me to point at. There might be some posts coming out soon.