Arthur Conmy comments on A Mechanistic Interpretability Analysis of Grokking

Arthur Conmy 9 Jan 2023 7:33 UTC
2 points
0
You can observe “double descent” of test loss curves in the grokking setting, and there is “grokking” of test set performance as model dimension is increased, as this paper points out