habryka comments on Should we publish mechanistic interpretability research?

habryka 21 Apr 2023 23:13 UTC
2 points
0
I’ve seen a lot of the articles here used in various ML syllabi: https://distill.pub/
The basic things studied here transfer pretty well to other architectures. Understanding the hierarchical nature of features transfer from vision to language, and indeed when I hear people talk about how features are structured in LLMs, they often use language borrowed from what we know about how they are structured in vision (i.e. having metaphorical edge-detectors/syntax-detectors that then feed up into higher level concepts, etc.)