RobertKirk comments on QAPR 5: grokking is maybe not that big a deal?

RobertKirk 25 Jul 2023 17:31 UTC
LW: 1 AF: 1
0
AF
If you train on infinite data, I assume you’d not see a delay between training and testing, but you’d expect a non-monotonic accuracy curve that looks kind of like the test accuracy curve in the finite-data regime? So I assume infinite data is also cheating?
- Rohin Shah 26 Jul 2023 6:49 UTC
  LW: 3 AF: 3
  0
  AF Parent
  I expect a delay even in the infinite data case, I think?
  Although I’m not quite sure what you mean by “infinite data” here—if the argument is that every data point will have been seen during training, then I agree that there won’t be any delay. But yes training on the test set (even via “we train on everything so there is no possible test set”) counts as cheating for this purpose.