Further, it’s helped to build out a toolkit of techniques to rigorously reverse engineer models. In the process of understanding this circuit, they refined the technique of activation patching into more sophisticated approaches such as path patching (and later causal scrubbing). And this has helped lay the foundations for developing future techniques! There are many interpretability techniques that are more scalable but less mechanistic, like probing. Having some
See a Twitter thread of some brief explorations I and Alex Silverstein did on this
I think you cut yourself off there both times.
Lol thanks. Fixed
You’re welcome, though did you miss a period here or did you want to write more?
Missed a period (I’m impressed I didn’t miss more tbh, I find it hard to remember that you’re supposed to have them at the end of paragraphs)