Jesse Hoogland comments on Gradient surfing: the hidden role of regularization

Jesse Hoogland 6 Feb 2023 16:53 UTC
2 points
0
Thanks Lawrence! I had missed the slingshot mechanism paper, so this is great!

(As an aside, I also think grokking is not very interesting to study—if you want a generalization phenomena to study, I’d just study a task without grokking, and where you can get immediately generalization or memorization depending on hyperparameters.)

I totally agree on there being much more interesting tasks than grokking with modulo arithmetic, but it seemed like an easy way to test the premise.

Also worth noting that grokking is pretty hyperparameter sensitive—it’s possible you just haven’t found the right size/form of noise yet!

I will continue the exploration!