Noosphere89 comments on 200 COP in MI: Looking for Circuits in the Wild

Noosphere89 29 Dec 2022 21:12 UTC
LW: 3 AF: 2
0
AF

Further, it’s helped to build out a toolkit of techniques to rigorously reverse engineer models. In the process of understanding this circuit, they refined the technique of activation patching into more sophisticated approaches such as path patching (and later causal scrubbing). And this has helped lay the foundations for developing future techniques! There are many interpretability techniques that are more scalable but less mechanistic, like probing. Having some

See a Twitter thread of some brief explorations I and Alex Silverstein did on this

I think you cut yourself off there both times.
- Neel Nanda 29 Dec 2022 21:17 UTC
  LW: 2 AF: 1
  0
  AF Parent
  Lol thanks. Fixed
  - Noosphere89 29 Dec 2022 21:18 UTC
    LW: 1 AF: 1
    0
    AF Parent
    You’re welcome, though did you miss a period here or did you want to write more?
    
    See a Twitter thread of some brief explorations I and Alex Silverstein did on this
    - Neel Nanda 29 Dec 2022 23:55 UTC
      LW: 2 AF: 1
      0
      AF Parent
      Missed a period (I’m impressed I didn’t miss more tbh, I find it hard to remember that you’re supposed to have them at the end of paragraphs)