the Support Vector Machine was not discovered until the 90s.
Why not? I’m not familiar with VC-theory, but the basic idea of separating two sets of points with a hyperplane with the maximum margin doesn’t seem that complex. What made this difficult?
Don’t quote me on this, but I believe the key insight is that the complexity of the max margin hyperplane model depends not on the number of dimensions of the feature space (which may be very large) but on the number of data points used to define the hyperplane (the support vectors), and the latter quantity is usually small. Though that realization is intuitively plausible, it required the VC-theory to actually prove.
Why not? I’m not familiar with VC-theory, but the basic idea of separating two sets of points with a hyperplane with the maximum margin doesn’t seem that complex. What made this difficult?
Don’t quote me on this, but I believe the key insight is that the complexity of the max margin hyperplane model depends not on the number of dimensions of the feature space (which may be very large) but on the number of data points used to define the hyperplane (the support vectors), and the latter quantity is usually small. Though that realization is intuitively plausible, it required the VC-theory to actually prove.