Here’s a drive-by question: have you considered experiments that might differentiate between the lottery ticket explanation and the evolutionary explanation?
In particular, your reasoning that formation of inductions heads on the repeated-subsequence tasks disproves the evolutionary explanation seems intuitively sound, but not quite bulletproof. Maybe the model has incentives to develop next-token heads that don’t depend on an induction head existing? I dunno, I might have an insufficient understanding of what induction heads do.
Fascinating paper!
Here’s a drive-by question: have you considered experiments that might differentiate between the lottery ticket explanation and the evolutionary explanation?
In particular, your reasoning that formation of inductions heads on the repeated-subsequence tasks disproves the evolutionary explanation seems intuitively sound, but not quite bulletproof. Maybe the model has incentives to develop next-token heads that don’t depend on an induction head existing? I dunno, I might have an insufficient understanding of what induction heads do.