The linked post you wrote about classical learning theory states that the bounds PAC gives are far more loose than what we see in practice for Neural Networks.
In the post you sketch some directions in which tighter bounds may be proven. It is my understanding that these directions have not been pursued further.
Given all that “Fully adequate account of generalization” seems like an overstatement, wouldn’t you agree?
At best we can say that PAC gives a nice toy model for thinking about notions like generalization and learnability as far as I can tell. Maybe I’m wrong- I’m not familiar with the literature- and I’d love to know more about what PAC & classical learning theory can tell us about neural networks.
I think that it gives us an adequate account of generalisation in the limit of infinite data (or, more specifically, in the case where we have enough data to wash out the influence of the inductive bias); this is what my original remark was about. I don’t think classical statistical learning theory gives us an adequate account of generalisation in the setting where the training data is small enough for our inductive bias to still matter, and it only has very limited things to say about out-of-distribution generalisation.
The linked post you wrote about classical learning theory states that the bounds PAC gives are far more loose than what we see in practice for Neural Networks. In the post you sketch some directions in which tighter bounds may be proven. It is my understanding that these directions have not been pursued further.
Given all that “Fully adequate account of generalization” seems like an overstatement, wouldn’t you agree?
At best we can say that PAC gives a nice toy model for thinking about notions like generalization and learnability as far as I can tell. Maybe I’m wrong- I’m not familiar with the literature- and I’d love to know more about what PAC & classical learning theory can tell us about neural networks.
I think that it gives us an adequate account of generalisation in the limit of infinite data (or, more specifically, in the case where we have enough data to wash out the influence of the inductive bias); this is what my original remark was about. I don’t think classical statistical learning theory gives us an adequate account of generalisation in the setting where the training data is small enough for our inductive bias to still matter, and it only has very limited things to say about out-of-distribution generalisation.
I see, thanks!