Experience Machine comments on Stan van Wingerden’s Shortform

Experience Machine 12 Dec 2024 11:26 UTC
3 points
0
Using almost the same training parameters as above (I used full batch and train_frac=0.5 to get faster & more consistent grokking, but I don’t think this matters here)
I did a few runs and the results all looked more or less like this. The training process of such toy models doesn’t contain so many bits of interesting information, so I wouldn’t be surprised if a variety of different metrics would capture this process in this case. (E.g. the training dynamics can be also modelled by an HMM, see here).