I’m pretty sure my framework doesn’t apply to grokking. I usually think about training as ending once we hit zero training loss, whereas grokking happens much later.
I’m pretty sure my framework doesn’t apply to grokking. I usually think about training as ending once we hit zero training loss, whereas grokking happens much later.