Fair point, I’ll add that in to the post. The main reason I recommend it so highly and prominently is that I think it builds valuable conceptual frameworks for reasoning about the pieces of a transformer, even if it somewhat overclaims on how far it can get on interpreting tiny attention-only models, and I think those broad intuitions still stand even after your critiques. Eg strict induction heads as an example of the kind of algorithm that can be implemented with attention, even if it’s not fully faithful to the underlying model. But I agree that these are worthwhile caveats to have in mind when reading, and the paper shouldn’t be blindly recommended.
Thanks! I agree that thinking through the idealized induction head algorithm seems healthy, but I think it seems important to know that that algorithm isn’t much of what those heads are actually doing!
Fair point, I’ll add that in to the post. The main reason I recommend it so highly and prominently is that I think it builds valuable conceptual frameworks for reasoning about the pieces of a transformer, even if it somewhat overclaims on how far it can get on interpreting tiny attention-only models, and I think those broad intuitions still stand even after your critiques. Eg strict induction heads as an example of the kind of algorithm that can be implemented with attention, even if it’s not fully faithful to the underlying model. But I agree that these are worthwhile caveats to have in mind when reading, and the paper shouldn’t be blindly recommended.
Thanks! I agree that thinking through the idealized induction head algorithm seems healthy, but I think it seems important to know that that algorithm isn’t much of what those heads are actually doing!