Joseph Miller comments on The Residual Expansion: A Framework for thinking about Transformer Circuits

Joseph Miller 2 Aug 2024 22:53 UTC
6 points
1
If I understand correctly, this residual decomposition is equivalent to the edge / factorized view of a transformer described here.

Update: actually the residual decomposition is incorrect—see my other comment.
- Daniel Tan 5 Aug 2024 10:20 UTC
  1 point
  0
  Parent
  I agree, this seems like exactly the same thing, which is great! In hindsight it’s not surprising that you / other people have already thought about this
  Do you think the ‘tree-ified view’ (to use your name for it) is a good abstraction for thinking about how a model works? Are individual terms in the expansion the right unit of analysis?
  - Joseph Miller 5 Aug 2024 18:50 UTC
    2 points
    0
    Parent
    The treeified view is different from the factorized view! See figure 1 here.
    I think the factorized view is pretty useful. But on other hand I think MLP + Attention Head circuits are too coarse-grained to be that interpretable.
    - Oliver Daniels 7 Aug 2024 5:22 UTC
      1 point
      0
      Parent
      Just to make it explicit and check my understanding—the residual decomposition is equivalent to edge / factorized view of the transformer in that we can express any term in the residual decomposition as a set of edges that form a path from input to output, e.g
      $I d$ = input → output
      $(A t t n_{4}^{3} \circ M L P_{2} \circ A t t_{1}^{0})$ = input-> Attn 1.0 → MLP 2 → Attn 4.3 → output
      And it follows that the (pre final layernorm) output of a transformer is the sum of all the “paths” from input to output constructed from the factorized DAG.
      - Joseph Miller 8 Aug 2024 5:07 UTC
        2 points
        1
        Parent
        Actually I think the residual decomposition is incorrect—see my other comment.