Ben comments on A Mechanistic Interpretability Analysis of Grokking

Ben 16 Aug 2022 12:12 UTC
LW: 5 AF: 4
4
AF
You suggest that first the model quickly memorises a chunk of the data set, then (slowly) learns rules afterward to simplify itself. (You mention that noticing x + y = y + x could allow it to half its memorised library with no loss).
I am interested in another way this process could go. We could imagine that the early models are doing something like “Check for the input in my library of known results, if its not in there just guess something.” So the model has two parts, [memorised] and [guess]. With some kind of fork to decide between them.
At first the guess is useless, and improves very slowly. So making sure the memorised information is correct and compact is the majority of the improvement. Eventually the memory saturates, and the only way to progress is refining the guess so the model does slightly-less awful in the cases it hasn’t memorised. So here the guess steadily improves, but these improvements don’t show up in the performance because the guess is still a fallback option that is rarely used. In this picture the phase change is presumably the point where the “guess” becomes so good the model “realises” it no longer needs its memorised library or the “if” tree.
These two pictures are quite different. In one information about the problem makes the library more efficient, leading eventually to an algorithm in the other the two strategies (memory, algorithm) are essentially independent and one eventually makes the other redundant.
Just the random thoughts of someone interested with no knowledge of the topic.
- Neel Nanda 16 Aug 2022 17:04 UTC
  LW: 4 AF: 3
  2
  AF Parent
  Interesting hypothesis, thanks!
  My guess is that memorisation isn’t really a discrete thing—it’s not that the model has either memorised a data point or not, it’s more that it’s fitting a messed up curve to approximated all training data as well as it can. And more parameters means it memorises all data points a bit better, not that it has excellent loss on some data and terrible loss on others, and gradually the excellent set expands.
  I haven’t properly tested this though! And I’m using full batch training, which probably messes this up a fair bit.