Thanks for the interest! I’m not really sure what you mean, though. By components, do you mean circuits or shards or...? I’m not sure what you mean by clarifying or deconfusing components, this sounds like interpretability, but there’s not much interpretability going on in the linked project. Feel free to elaborate, though, and I’ll try to respond again.
Hello there! What I meant as components in my comment are like the attention mechanism itself. For reference, here are the mean weights of two models I’m studying.
Thanks for the interest! I’m not really sure what you mean, though. By components, do you mean circuits or shards or...? I’m not sure what you mean by clarifying or deconfusing components, this sounds like interpretability, but there’s not much interpretability going on in the linked project. Feel free to elaborate, though, and I’ll try to respond again.
Hello there! What I meant as components in my comment are like the attention mechanism itself. For reference, here are the mean weights of two models I’m studying.