For instance, the phenomenon of in-context learning by language models used to be considered a black box, now it has been explained through interpretability efforts.
Has it? I’m quite skeptical. (Separately, only a small fraction of the efforts you’re talking about are well described as mech interp or would require tools like these.)
I don’t really want to get into this, just registering my skeptism. I’m quite familiar with the in-context learning and induction heads work assuming that’s what you’re refering to.
+1 that I’m still fairly confused about in context learning, induction heads seem like a big part of the story but we’re still confused about those too!
I agree that in-context learning is not entirely explainable yet, but we’re not completely in the dark about it. We have some understanding and direction or explainability regarding where this ability might stem from, and it’s only going to get much clearer from here.
Has it? I’m quite skeptical. (Separately, only a small fraction of the efforts you’re talking about are well described as mech interp or would require tools like these.)
I don’t really want to get into this, just registering my skeptism. I’m quite familiar with the in-context learning and induction heads work assuming that’s what you’re refering to.
+1 that I’m still fairly confused about in context learning, induction heads seem like a big part of the story but we’re still confused about those too!
I agree that in-context learning is not entirely explainable yet, but we’re not completely in the dark about it. We have some understanding and direction or explainability regarding where this ability might stem from, and it’s only going to get much clearer from here.