Erich_Grunewald comments on A Mechanistic Interpretability Analysis of Grokking

Erich_Grunewald 26 Feb 2023 14:59 UTC
1 point
0
Thanks for doing this research, it’s super interesting. I have a couple of questions about how this relates (or doesn’t relate) to LLMs:
1. You mention that LLMs probably don’t do anything like this since, empirically, repeated data harms test performance, so it is avoided. I’m wondering if we’re looking at the same thing, though. If you train an LLM with a subset of data repeated N times for many epochs, wouldn’t you expect it to eventually, thanks to regularization, learn some general algorithm governing the repetition; i.e. it’d grok, but it would grok the repetition, not whatever the text in the data set is about?
2. Similarly, what would happen if you trained a modulo addition model similar to the one you study (but with a far larger modulo threshold) on lots of never-duplicated data? My impression is that it wouldn’t exactly grok, since memorization probably isn’t very useful, but it would learn general patterns that help it make better predictions, and eventually the general, independently working algorithm. Does that seem right? If so, do you think it’s likely that something like this goes on in LLMs?
3. I’m not sure at which level you normally de-duplicate training data for LLMs, but I assume it’s at a fairly high level (e.g. paragraph or document). If so, when you train an LLM, it’ll presumably contain lots reoccurring lower-level phrases. Perhaps some of these, for some specific task (addition, say), have examples that are both repeated many times in the dataset, but not covering all possible inputs (similar to your experiment training for many epochs on 30% of possible inputs, only here it could still be a single epoch). If so, couldn’t the same sort of grokking happen in an LLM? E.g. first the LLM learns to memorize answers to addition of single- and double-digit numbers, but eventually it may, due to regularization, grok addition.