Daniel Murfet comments on You’re Measuring Model Complexity Wrong

Daniel Murfet 12 Oct 2023 21:06 UTC
10 points
2
Not easily detected. As in, there might be a sudden (in SGD steps) change in the internal structure of the network over training that is not easily visible in the loss or other metrics that you would normally track. If you think of the loss as an average over performance on many thousands of subtasks, a change in internal structure (e.g. a circuit appearing in a phase transition) relevant to one task may not change the loss much.