gwern comments on Interpretability

gwern 16 Jun 2022 16:17 UTC
2 points
That’s within-training by epoch/iteration, not across trained models by total size/compute. It’s not clear that they are at all the same sort of thing, because you can get spikes trivially by things like the learning rate dropping. Investigating whether there is any connection would be interesting.