Thomas Dullien comments on A Mechanistic Interpretability Analysis of Grokking

Thomas Dullien 6 Jun 2023 20:36 UTC
1 point
0
AF
Good stuff. A few thoughts:

1. Assuming a model has memorized the training data, and still have enough “spare capacity” to play lottery ticket hypothesis to find generalizing solutions to a subset of the memorized data, you’ll eventually end up with a number of partial solutions that generalize to a subset of the memorized data (obviously assuming some form of regularization towards simplicity). So this may be where the “underparametrized” regime of ML of the past went wrong: That approach tried to force the model into generalization without memorization, but by being stingy with parameters, forced the model to first and foremost memorize—there was no spare capacity to “play / experiment with possibly generalizing solutions” left. This then led to memorization-only models, to which researchers reacted by restricting parameters more …

2. Occam’s razor favors simpler models (for some definition of simplicity) over more complex models, given equal predictive power. The best definition of “model simplicity” that we have may in fact be Kolmogorov complexity of the weight matrices. This would mean that if we want a model to apply Occam’s razor, we should see if we can use a measure of Kolmogorov complexity of the weights as regularization. The “best” approximation we currently have for Kolmogorov complexity is … compression, which in itself is a prediction problem. So perhaps the way to encourage good generalization in models is to measure how good the weights can be predicted by another model (?). Apologies if this may sound like a crackpot idea.
3. It might be worth experimenting with shifting the regularization term during training, initially encouraging wide connectivity, and then shifting to either sparsity or low Kolmogorov complexity. There’s an intriguing parallel to synaptic pruning in childhood.
- Weiahe H 30 Mar 2025 23:46 UTC
  1 point
  0
  AF Parent
  So perhaps the way to encourage good generalization in models is to measure how good the weights can be predicted by another model (?). Apologies if this may sound like a crackpot idea.
  Interesting idea. The natural next question is: how would you use that second model to determine the kolmogorov complexity (or a metric similar to kolmogorov complexity) of the first model’s weights? Let’s say you want to use the complexity of the second model, assuming that it is the simplest possible model that can predict the first models weights, to help you determine that. But in order to satisfy that assumption, you could use a third model in a similar way to minimize the complexity of the second. And so on. Eventually you need to determine the complexity of the weights without training another model, using some metric (whether its weight norms, performance after pruning, or [insert clever method from the future]). Why not just apply this metric to the first model and not train additional ones?
  That said, I could be overlooking something and empirical results could suggest otherwise, so it could still be worth testing the idea out.