Arthur Conmy comments on Should we publish mechanistic interpretability research?

Arthur Conmy 21 Apr 2023 22:30 UTC
3 points
0
Which sorts of works are you referring to on Chris Olah’s blog? I see mostly vision interpretability work (which has not helped with vision capabilities), RNN stuff (which essentially does not help capabilities because of transformers) and one article on back-prop, which is more engineering-adjacent but probably replaceable (I’ve seen pretty similar explanations in at least one publicly available Stanford course).
- habryka 21 Apr 2023 23:13 UTC
  2 points
  0
  Parent
  I’ve seen a lot of the articles here used in various ML syllabi: https://distill.pub/
  The basic things studied here transfer pretty well to other architectures. Understanding the hierarchical nature of features transfer from vision to language, and indeed when I hear people talk about how features are structured in LLMs, they often use language borrowed from what we know about how they are structured in vision (i.e. having metaphorical edge-detectors/syntax-detectors that then feed up into higher level concepts, etc.)