Quintin Pope comments on QAPR 5: grokking is maybe not that big a deal?

Quintin Pope 4 Aug 2023 1:14 UTC
2 points
0
I don’t think that explicitly aiming for grokking is a very efficient way to improve the training of realistic ML systems. Partially, this is because grokking definitionally requires that the model first memorize the data, before then generalizing. But if you want actual performance, then you should aim for immediate generalization.
Further, methods of hastening grokking generalization largely amount to standard ML practices such as tuning the hyperparameters, initialization distribution, or training on more data.

Quintin Pope comments on QAPR 5: grokking is maybe not *that* big a deal?