Neel Nanda comments on A Mechanistic Interpretability Analysis of Grokking

Neel Nanda 15 Aug 2022 17:08 UTC
LW: 17 AF: 4
8
AF
Hmm. So firstly, I don’t think ML grokking and human grokking having the same name is that relevant—it could just be a vague analogy. And I definitely don’t claim to understand neuroscience!
That said, I’d guess there’s something relevant about phase changes? Internally, I know that I initially feel very confused, then have some intuition of ‘I half see some structure but it’s super fuzzy’, and then eventually things magically click into place. And maybe there’s some similar structure around how phase changes happen—useful explanations get reinforced, and as explanations become more useful they become reinforced much faster, leading to a feeling of ‘clicking’
It seems less obvious to me that human grokking looks like ‘stare at the same data points a bunch of times until things click’. It also seems plausible that there’s some transfer learning going on—here I train models from scratch, but when I personally grok things it feels like I’m fitting the new problem into existing structure in my mind—maybe analogous to how induction heads get repurposed for few shot learning
- Rodrigo Heck 16 Aug 2022 18:47 UTC
  4 points
  4
  Parent
  
  It seems less obvious to me that human grokking looks like ‘stare at the same data points a bunch of times until things click’
  
  Personally, I don’t think it’s that different. At least for language. When I read some unrecognizable word in a foreign language, my mind tries first to retrieve other times I have seen this word but haven’t understood. Suppose I can remember 3 of these instances. Now I have 3 + 1 of these examples in my mind and, extracting the similar context they share, I can finally deduct the meaning.