From reading your post it seems that classical VC theory gives vacuous bounds for NN learning behaviour. Correct me if I’m wrong. You say that the PAC formalism can be improved to be more realistic and suggest more non-vacuous bounds may be proved. Do you have a reference where non-vacuous bounds are proved?
The bounds are not exactly vacuous—in fact, they are (in a sense) tight. However, they concern a somewhat adversarial setting, where the data distribution may be selected arbitrarily (including by making it maximally opposed to the inductive bias of the learning algorithm). This means that the bounds end up being much larger than what you would typically observe in practice, if you give typical problems to a learning algorithm whose inductive bias is attuned to the structure of “typical” problems.
If you move from this adversarial setting to a more probabilistic setting, where you assume a fixed distribution over C or Δ(X×{0,1}), then you may be able to prove tighter probabilistic bounds. However, I do not have any references of places where this actually has been done (and as far as I know, it has not been done before).
From reading your post it seems that classical VC theory gives vacuous bounds for NN learning behaviour. Correct me if I’m wrong. You say that the PAC formalism can be improved to be more realistic and suggest more non-vacuous bounds may be proved. Do you have a reference where non-vacuous bounds are proved?
The bounds are not exactly vacuous—in fact, they are (in a sense) tight. However, they concern a somewhat adversarial setting, where the data distribution may be selected arbitrarily (including by making it maximally opposed to the inductive bias of the learning algorithm). This means that the bounds end up being much larger than what you would typically observe in practice, if you give typical problems to a learning algorithm whose inductive bias is attuned to the structure of “typical” problems.
If you move from this adversarial setting to a more probabilistic setting, where you assume a fixed distribution over C or Δ(X×{0,1}), then you may be able to prove tighter probabilistic bounds. However, I do not have any references of places where this actually has been done (and as far as I know, it has not been done before).