ryan_greenblatt comments on Laying the Foundations for Vision and Multimodal Mechanistic Interpretability & Open Problems

ryan_greenblatt 14 Mar 2024 4:31 UTC
4 points
2

For instance, the phenomenon of in-context learning by language models used to be considered a black box, now it has been explained through interpretability efforts.

Has it? I’m quite skeptical. (Separately, only a small fraction of the efforts you’re talking about are well described as mech interp or would require tools like these.)

I don’t really want to get into this, just registering my skeptism. I’m quite familiar with the in-context learning and induction heads work assuming that’s what you’re refering to.
- Neel Nanda 14 Mar 2024 10:13 UTC
  4 points
  0
  Parent
  +1 that I’m still fairly confused about in context learning, induction heads seem like a big part of the story but we’re still confused about those too!
  - PraneetNeuro 15 Mar 2024 17:24 UTC
    4 points
    2
    Parent
    I agree that in-context learning is not entirely explainable yet, but we’re not completely in the dark about it. We have some understanding and direction or explainability regarding where this ability might stem from, and it’s only going to get much clearer from here.