Don’t quote me on this, but I believe the key insight is that the complexity of the max margin hyperplane model depends not on the number of dimensions of the feature space (which may be very large) but on the number of data points used to define the hyperplane (the support vectors), and the latter quantity is usually small. Though that realization is intuitively plausible, it required the VC-theory to actually prove.
Don’t quote me on this, but I believe the key insight is that the complexity of the max margin hyperplane model depends not on the number of dimensions of the feature space (which may be very large) but on the number of data points used to define the hyperplane (the support vectors), and the latter quantity is usually small. Though that realization is intuitively plausible, it required the VC-theory to actually prove.