I would be interested what current SLT-dogma on grokking is. I get asked whether SLT explains grokking all the time but always have to reply with an unsatisfying ‘there’s probably something there but I don’t understand the details’.
IIRC @jake_mendel and @Kaarel have thought about this more, but my rough recollection is: a simple story about the regularization seems sufficient to explain the training dynamics, so a fancier SLT story isn’t obviously necessary. My guess is that there’s probably something interesting you could say using SLT, but nothing that simpler arguments about the regularization wouldn’t tell you also. But I haven’t thought about this enough.
I would be interested what current SLT-dogma on grokking is. I get asked whether SLT explains grokking all the time but always have to reply with an unsatisfying ‘there’s probably something there but I don’t understand the details’.
@Zach Furman @Jesse Hoogland
IIRC @jake_mendel and @Kaarel have thought about this more, but my rough recollection is: a simple story about the regularization seems sufficient to explain the training dynamics, so a fancier SLT story isn’t obviously necessary. My guess is that there’s probably something interesting you could say using SLT, but nothing that simpler arguments about the regularization wouldn’t tell you also. But I haven’t thought about this enough.