Hey! Out of curiosity, has grokking been observed in any non-algorithmic dataset to date, or just these toy, algorithmic datasets?
It can be induced on MNIST by deliberately choosing worse initializations for the model, as Omnigrok demonstrated.
Got it, thanks!
Hey! Out of curiosity, has grokking been observed in any non-algorithmic dataset to date, or just these toy, algorithmic datasets?
It can be induced on MNIST by deliberately choosing worse initializations for the model, as Omnigrok demonstrated.
Got it, thanks!