1. ERM can be derived from Bayes by assuming your “true” distribution is close to a deterministic function plus a probabilistic error, but this fact is usually obscured 2. Risk is not a good inner product (naively) - functions with similar risk on a given loss function can be very different 3. The choice of functional norm is important, but uniform convergence just picks the sup norm without thinking carefully about it 4. There are other important properties of models/functions than just risk 5. Learning theory has failed to find tight (generalization) bounds, and bounds might not even be the right thing to study in the first place
My summary (endorsed by Jesse):
1. ERM can be derived from Bayes by assuming your “true” distribution is close to a deterministic function plus a probabilistic error, but this fact is usually obscured
2. Risk is not a good inner product (naively) - functions with similar risk on a given loss function can be very different
3. The choice of functional norm is important, but uniform convergence just picks the sup norm without thinking carefully about it
4. There are other important properties of models/functions than just risk
5. Learning theory has failed to find tight (generalization) bounds, and bounds might not even be the right thing to study in the first place