Olli Järviniemi comments on Investigating the learning coefficient of modular addition: hackathon project

Olli Järviniemi 6 Jun 2024 19:56 UTC
2 points
0
Nice work! It’s great to have tests on how well one can approximate the learning coefficient in practice and how the coefficient corresponds to high-level properties. This post is perhaps the best illustration of SLT’s applicability to practical problems that I know of, so thank you.
Question about the phase transitions: I don’t quite see the connection between the learning coefficient and phase transitions. You write:
In particular, these unstable measurements “notice” the grokking transition between memorization and generalization when training loss stabilizes and test loss goes down. (As our networks are quite efficient, this happens relatively early in training.)
For the first 5 checkpoints the test loss goes up, after which it goes (sharply) down. However, looking at the learning coefficient in the first 5 to 10 checkpoints, I can’t really pinpoint “ah, that’s where the model starts to generalize”. Sure, the learning coefficient starts to go more sharply up, but this seems like it could be explained by the training loss going down, no?