Link: Interview with Vladimir Vapnik
I recently stumbled across this remarkable interview with Vladimir Vapnik, a leading light in statistical learning theory, one of the creators of the Support Vector Machine algorithm, and generally a cool guy. The interviewer obviously knows his stuff and asks probing questions. Vapnik describes his current research and also makes some interesting philosophical comments:
V-V: I believe that something drastic has happened in computer science and machine learning. Until recently, philosophy was based on the very simple idea that the world is simple. In machine learning, for the first time, we have examples where the world is not simple. For example, when we solve the “forest” problem (which is a low-dimensional problem) and use data of size 15,000 we get 85%-87% accuracy. However, when we use 500,000 training examples we achieve 98% of correct answers. This means that a good decision rule is not a simple one, it cannot be described by a very few parameters. This is actually a crucial point in approach to empirical inference.
This point was very well described by Einstein who said “when the solution is simple, God is answering”. That is, if a law is simple we can find it. He also said “when the number of factors coming into play is too large, scientific methods in most cases fail”. In machine learning we dealing with a large number of factors. So the question is what is the real world? Is it simple or complex? Machine learning shows that there are examples of complex worlds. We should approach complex worlds from a completely different position than simple worlds. For example, in a complex world one should give up explain-ability (the main goal in classical science) to gain a better predict-ability.
R-GB: Do you claim that the assumption of mathematics and other sciences that there are very few and simple rules that govern the world is wrong?
V-V: I believe that it is wrong. As I mentioned before, the (low-dimensional) problem “forest” has a perfect solution, but it is not simple and you cannot obtain this solution using 15,000 examples.
Later:
R-GB: What do you think about the bounds on uniform convergence? Are they as good as we can expect them to be?
V-V: They are O.K. However the main problem is not the bound. There are conceptual questions and technical questions. From a conceptual point of view, you cannot avoid uniform convergence arguments; it is a necessity. One can try to improve the bounds, but it is a technical problem. My concern is that machine learning is not only about technical things, it is also about philosophy: What is the complex world science about? The improvement of the bound is an extremely interesting problem from mathematical point of view. But even if you’ll get a better bound it will not be able help to attack the main problem: what to do in complex worlds?
Updated link: https://www.learningtheory.org/learning-has-just-started-an-interview-with-prof-vladimir-vapnik/ (while looking up his very weird transfer-learning research).
This discussion, complex world vs simple rules, is very old and goes back to Plato and Aristotle. Plato explained our ability to recognize each and every A: All concrete examples partake,in varying degrees, in the ideal A. As these ideals do not exist in the world of the senses, he postulated some kind of hyperreality, the world of ideas, where they exist in timeless perfection. Our souls come from there to this world, and all recognition is re-cognition. Of course. this stuff is hard to swallow for a programmer trying to build some damned machine. A good prototype is better than nothing, but the ideal A has so far eluded any constructivist attempt.
His critic Aristotle did not believe in the world of ideas. As a taxonomist, he described the camel by its attributes. If the distinguishing attributes are present, it’s a camel. Else it’s not a camel. Characterizing an ‘A’ by its attributes has proved harder than it seems. What is an attribute? Which attributes are useful? Is this line a short line, or is it already long? Is this a round bow or an edge? Not every A looks like a pointy hat! Does a very characteristic feature compensate for the lack of three others? Even if we have good features, there may be no simple rules. There is this well-known “Rule”: No Rule without Exception: Even folk wisdom discourages any attempt to catch A-ness in a simple net of if and else.
We should not expect that the concept of A-ness can be expressed by such simple means. The set of all “A” ( it exists as a platonic set somewhere, or doesn’t it? ) may be pulled back into R^n as a set of grayscale images, but do we really know about its geometric structure? For large n, it is a thin subset, a complicated geometric object. The metric of R^n will preserve its local structure, but that’s all. It does not tell us about the concept of A-ness. We should expect that large amounts of memory are necessary.
FYI: (shameless plug) I’ve tried to illustrate my ideas about a connection between the topology of finite spaces, continuous maps, product spaces and quotient spaces and the factorization of classifying maps on my website, learning-by-glueing.com. It’s not finished, any comments are welcome.
Interesting quotes from the interview:
...
Scary thought: What if the rules for AI are very complex so as to make it impossible to build one or prove that an AI will be stable and or friendly? If this turns out to be the case then the singularity will never happen and we have an explanation for the fermi paradox.
It’s a legitimate possibility that FAI is just too hard for the human race to achieve from anything like our current state, so that (barring some fantastic luck) we’re either doomed to an extinction event, or to a “cosmic locust” future, or to something completely different.
In fact, I’d bet 20 karma against 10 that Eliezer would assign a probability of at least 1% to this being the case, and I’d bet 50 against 10 that Robin assigns a probability of 50% or greater to it.
However, if FAI is in fact too difficult, then the SIAI program seems to do no harm; and if it’s not too hard, it could do a world of good. (This is one benefit of the “provably Friendly” requirement, IMO.)
Nah, it’s all good. We’ll just ‘shut up and’ ‘use the try harder’.
This sounds a lot like True vs. Useful, again.
(Of course it’s a bit redundant to call it “machine” learning, since we are learning machines, and there’s little reason to assume that we don’t learn using mechanical processes optimized for multi-factor matching. And that would tend to explain why learning and skills don’t always transfer well between Theory and Practice.)