PoignardAzur comments on A Mechanistic Interpretability Analysis of Grokking

PoignardAzur 1 Jul 2023 18:12 UTC
1 point
0
Fascinating paper!

Here’s a drive-by question: have you considered experiments that might differentiate between the lottery ticket explanation and the evolutionary explanation?

In particular, your reasoning that formation of inductions heads on the repeated-subsequence tasks disproves the evolutionary explanation seems intuitively sound, but not quite bulletproof. Maybe the model has incentives to develop next-token heads that don’t depend on an induction head existing? I dunno, I might have an insufficient understanding of what induction heads do.