Gunnar_Zarncke comments on A Mechanistic Interpretability Analysis of Grokking

Gunnar_Zarncke 15 Aug 2022 13:27 UTC
LW: 10 AF: 3
4
AF
You don’t talk about human analogs of grokking, and that makes sense for a technical paper like this. Nonetheless, grokking also seems to happen in humans, and everybody has had “Aha!” moments before. Can you maybe comment a bit on the relation to human learning? It seems clear that human grokking is not a process that purely depends on the number of training samples seen but also on the availability of hypotheses. People grok faster if you provide them with symbolic descriptions of what goes on. What are your thoughts on the representation and transfer of the resulting structure, e.g., via language/token streams?
- Neel Nanda 15 Aug 2022 17:08 UTC
  LW: 17 AF: 4
  8
  AF Parent
  Hmm. So firstly, I don’t think ML grokking and human grokking having the same name is that relevant—it could just be a vague analogy. And I definitely don’t claim to understand neuroscience!
  That said, I’d guess there’s something relevant about phase changes? Internally, I know that I initially feel very confused, then have some intuition of ‘I half see some structure but it’s super fuzzy’, and then eventually things magically click into place. And maybe there’s some similar structure around how phase changes happen—useful explanations get reinforced, and as explanations become more useful they become reinforced much faster, leading to a feeling of ‘clicking’
  It seems less obvious to me that human grokking looks like ‘stare at the same data points a bunch of times until things click’. It also seems plausible that there’s some transfer learning going on—here I train models from scratch, but when I personally grok things it feels like I’m fitting the new problem into existing structure in my mind—maybe analogous to how induction heads get repurposed for few shot learning
  - Rodrigo Heck 16 Aug 2022 18:47 UTC
    4 points
    4
    Parent
    
    It seems less obvious to me that human grokking looks like ‘stare at the same data points a bunch of times until things click’
    
    Personally, I don’t think it’s that different. At least for language. When I read some unrecognizable word in a foreign language, my mind tries first to retrieve other times I have seen this word but haven’t understood. Suppose I can remember 3 of these instances. Now I have 3 + 1 of these examples in my mind and, extracting the similar context they share, I can finally deduct the meaning.
- jacob_cannell 19 Aug 2022 7:04 UTC
  8 points
  6
  Parent
  I have a distinct strong memory of grokking at a very young age (naturally well before I heard the term), learning the basic idea of base-10 digit increments from a long list of the integer sequence examples. What I find most interesting is how rewarding grokking is. There’s this clear strong aha! reward feeling in the brain when one learns a low complexity rule that compresses some (presumably important) high level sensory token stream.
  
  There’s a pretty strong bayesian argument that learning is sensory compression and so this is just the general low level mechanism underlying much of the brain, but at the same time it seems to have a high level system 2 counterpart.