but as ML moves to larger and larger models, why should that matter? The answer, I think, is that the reason larger models generalize better isn’t because they’re more complex, but because they’re actually simpler.
On its face, this statement seems implausible to me. Are you saying, for instance, that the model of a dog that a human has in their head should be simpler than the model of a dog that an image classifier has?
What double descent definitely says is that for a fixed dataset, larger models with zero training error are simpler than smaller models with zero training error. I think it does say somewhat more than that also, which is that larger models do have a real tendency towards being better at finding simpler models in general. That being said, the dataset on which the concept of a dog in your head was trained on is presumably way larger than that of any ML model, so even if your brain is really good at implementing Occam’s razor and finding simple models, your model is still probably going to be more complicated.
On its face, this statement seems implausible to me. Are you saying, for instance, that the model of a dog that a human has in their head should be simpler than the model of a dog that an image classifier has?
I just edited the last sentence to be clearer in terms of what I actually mean by it.
What double descent definitely says is that for a fixed dataset, larger models with zero training error are simpler than smaller models with zero training error. I think it does say somewhat more than that also, which is that larger models do have a real tendency towards being better at finding simpler models in general. That being said, the dataset on which the concept of a dog in your head was trained on is presumably way larger than that of any ML model, so even if your brain is really good at implementing Occam’s razor and finding simple models, your model is still probably going to be more complicated.