I don’t understand why we want a theoretical explanation of neural network generalization to have the same “parts we can examine” as a neural network. If we could describe a prior such that Bayesian updating on that prior gave the same generalization behavior as neural networks, then this would not “explain feature learning”, right? But it would still be a perfectly useful theoretical account of NN generalization.
I agree that evidence seems to suggest that finite width neural networks seem to generalize a little better than infinite width NTK regression. But attributing this to feature learning doesn’t follow. Couldn’t neural networks just have a slightly better implicit prior, for which the NTK is just an approximation?
If we could describe a prior such that Bayesian updating on that prior gave the same generalization behavior
Sure, I just think that any such prior is likely to explicitly or implicitly explain feature learning, since feature learning is part of what makes SGD-trained networks work.
But attributing this to feature learning doesn’t follow
I think it’s likely that dog-nose-detecting neurons play a part in helping to classify dogs, curve-detecting neurons play a part in classifying objects more generally, etc. This is all that is meant by ‘feature learning’ - intermediate neurons changing in some functionally-useful way. And feature learning is required for pre-training on a related task to be helpful, so it would be a weird coincidence if it was useless when training on a single task.
There’s also a bunch of examples of interpretability work where they find intermediate neurons having changed in clearly functionally-useful ways. I haven’t read it in detail but this article analyzes how a particular family of neurons comes together to implement a curve-detecting algorithm, it’s clear that the intermediate neurons have to change substantially in order for this circuit to work.
I don’t understand why we want a theoretical explanation of neural network generalization to have the same “parts we can examine” as a neural network.
If we could describe a prior such that Bayesian updating on that prior gave the same generalization behavior as neural networks, then this would not “explain feature learning”, right? But it would still be a perfectly useful theoretical account of NN generalization.
I agree that evidence seems to suggest that finite width neural networks seem to generalize a little better than infinite width NTK regression. But attributing this to feature learning doesn’t follow. Couldn’t neural networks just have a slightly better implicit prior, for which the NTK is just an approximation?
Sure, I just think that any such prior is likely to explicitly or implicitly explain feature learning, since feature learning is part of what makes SGD-trained networks work.
I think it’s likely that dog-nose-detecting neurons play a part in helping to classify dogs, curve-detecting neurons play a part in classifying objects more generally, etc. This is all that is meant by ‘feature learning’ - intermediate neurons changing in some functionally-useful way. And feature learning is required for pre-training on a related task to be helpful, so it would be a weird coincidence if it was useless when training on a single task.
There’s also a bunch of examples of interpretability work where they find intermediate neurons having changed in clearly functionally-useful ways. I haven’t read it in detail but this article analyzes how a particular family of neurons comes together to implement a curve-detecting algorithm, it’s clear that the intermediate neurons have to change substantially in order for this circuit to work.