You can observe “double descent” of test loss curves in the grokking setting, and there is “grokking” of test set performance as model dimension is increased, as this paper points out
You can observe “double descent” of test loss curves in the grokking setting, and there is “grokking” of test set performance as model dimension is increased, as this paper points out