Neel Nanda comments on A Mechanistic Interpretability Analysis of Grokking

Neel Nanda 16 Aug 2022 17:04 UTC
LW: 4 AF: 3
2
AF
Interesting hypothesis, thanks!
My guess is that memorisation isn’t really a discrete thing—it’s not that the model has either memorised a data point or not, it’s more that it’s fitting a messed up curve to approximated all training data as well as it can. And more parameters means it memorises all data points a bit better, not that it has excellent loss on some data and terrible loss on others, and gradually the excellent set expands.
I haven’t properly tested this though! And I’m using full batch training, which probably messes this up a fair bit.