One thing I’ve been wondering about deep neural networks: to what extent are neural networks novel and non-obvious? To what extent has evolution invented and thus taught us something very important to know for AI? (I realize this counterfactual is hard to evaluate.)
That is, imagine a world like ours but in which for some reason, no one had ever been sufficiently interested in neurons & the brain as to make the basic findings about neural network architecture and its power like Pitts & McCulloch. Would anyone reinvent them or any isomorphic algorithm or discover superior statistical/machine-learning methods?
For example, Ilya comments elsewhere that he doesn’t think much of neural networks inasmuch as they’re relatively simple, ‘just’ a bunch of logistic regressions wired together in layers and adjusted to reduce error. True enough—for all the subleties, even a big ImageNet-winning neural network is not that complex to implement; you don’t have to be a genius to create some neural nets.
Yet, offhand, I’m having a hard time thinking of any non-neural network algorithms which operate like a neural network in putting together a lot of little things in layers and achieving high performance. That’s not like any of your usual regressions or tests, multi-level models aren’t very close, random forests and bagging and factor analysis may be universal or consistent but are ‘flat’...
Nor do I see many instances of people proposing new methods which turn out to just be a convolutional network with nodes and hidden layers renamed. (A contrast here would be Turing’s halting theorem: it seems like you can’t throw a stick among language or system papers without hitting a system complicated enough to be Turing-complete and hence indecidable, and like there were a small cottage industry post-Turing of showing that yet another system could be turned into a Turing machine or a result could be interpreted as proving something well-known about Turing machines.) There don’t seem to be ‘multiple inventions’ here, as if the paradigm were non-obvious and, without the biological inspiration.
So if humanity had had no biological neural networks to steal the general idea and as proof of feasibility, would machine learning & AI be far behind where they are now?
This 2007 talk by Yann LeCun, Who is Afraid of Non-Convex Loss Functions? seems very relevant to your question. I’m far from an ML expert, but here’s my understanding from that talk and various other sources. Basically there is no theoretical reason to think that deep neural nets can be trained for any interesting AI task, because they are not convex so there’s no guarantee that when you try to optimize the weights you won’t get stuck in local minima or flat spots. People tried to use DNNs anyway and suffered from those problems in practice as well, so the field almost gave it up entirely and limited itself to convex methods (such as SVM and logistic regression) which don’t have these optimization problems but do have other limitations. It eventually turned out that if you apply various tricks, good enough local optima can be found for DNNs for certain types of AI problems. (Far from “you don’t have to be a genius to create some neural nets”, those tricks weren’t easy to find otherwise it wouldn’t have taken so long!)
Without biological neural networks as inspiration and proof of feasibility, I guess people probably still would have had the idea to put things in layers and try to reduce error, but would have given up more completely when they hit the optimization problems, and nobody would have found those tricks until much later when they exhausted other approaches and came back to deep nets.
The only crossover that comes to mind for me is the vision deep learning ‘discovering’ edge detection. There also is some interest in sparse NN activation.
NNs had lost favor in the AI community after 1969 (minsky’s paper) and only have become popular again in the last decade
Yes, I’m familiar with the history. But how far would we be without the neural network work done since ~2001? The non-neural-network competitors on Imagenet like SVM are nowhere near human levels of performance, Watson required neural networks, Stanley won the DARPA Grand Challenge without neural networks because it had so many sensors but real self-driving cars will have to use neural networks, neural networks are why Google Translate has gone from roughly Babelfish levels (hysterically bad) to remarkably good, voice recognition has gone from mostly hypothetical to routine on smartphones...
What major AI achievements have SVMs or random forests racked up over the past decade comparable to any of that?
So if humanity had had no biological neural networks to steal the general idea and as proof of feasibility, would machine learning & AI be far behind where they are now?
NNs are popular now for their deep learning properties and ability to learn features from unlabeled data (like edge detection).
Comparing NNs to SVMs isn’t really fair. You use the tool best for the job. If you have lots of labeled data you are more likely to use an SVM. It just depends on what problem you are being asked so solve. And of course you might feed an NNs output into an SVM or vice versa.
As for major achievements—NNs are leading for now because 1) most of the world’s data is unlabeled and 2) automated feature discovery (deep learning) is better then paying people to craft features.
NNs connection to biology is very thin. Artificial neurons don’t look or act like regular neurons at all.
I am well aware of that. Nevertheless, as a historical fact, they were inspired by real neurons, they do operate more like real neurons than do, say, SVMs or random forests, and this is the background to my original question.
If you have lots of labeled data you are more likely to use an SVM.
ImageNet is a lot of labeled data, to give one example.
As for major achievements—NNs are leading for now because …
There is a difference between explaining, and explaining away. You seem to think you are doing the latter, while you’re really just doing the former.
What year do you put the change in google translate? It didn’t switch to neural nets until 2012, right? Did anyone notice the change? My memory is that it was dramatically better than babelfish in 2007, let alone 2010.
Good question… I know that Google Translate began as a pretty bad outsourced translator (SYSTRAN) because I had a lot of trouble figuring out when Translate first came out for my Google survival analysis, and it began being upgraded and expanded almost constantly from ~2002 onwards. The 2007 switch was supposedly from the company SYSTRAN to an internal system, but what does that mean? SYSTRAN is a proprietary company which could be using anything it wants internally, and admits it’s a hybrid system. The 2006 beta just calls it statistics and machine learning, with no details about what this means. Google Scholar’s no help here either—hits are swamped by research papers mentioning Translate, and a few more recent hits about the neural networks used in various recent Google mobile-oriented services like speech or image recognition.
So… I have no idea. Highly unlikely to predate their internal translator in 2006, anyway, but could be your 2012 date.
At the very least, there are networks of artificial neurons. You seem to accept Ilya’s dismissal of the artificial neuron as too simple to credit, but take the networks as the biologically inspired part. I view those components exactly opposite.
Networks of simple components come up everywhere. There were circuits of electrical components a century ago. A parsed computer program is a network of simple components. Many people doing genetic programming (inspired by biology, but not neurology) work with such trees or networks. Selfridge’s Pandemonium (1958) advocated features built of features, but I think it was inspired by introspective psychology, not neuroscience.
Whereas the common artificial neuron seems crazy to me. It doesn’t matter how simple it is, if it is unmotivated. What seems crazy to me is the biologically inspired idea of a discrete output. Why have a threshold or probabilistic firing in the middle of the network? Of course, you want something like that at the very end of a discrimination task, so maybe you’d think of recycling it into the middle, but not me. I have heard it described as a kind of regularization, so maybe people would have come up with it by thinking about regularization. Or maybe it could be replaced with other regularizations. And a lot of methods have been adapted to real outputs, so maybe the discrete outputs didn’t matter.
So that’s the “neural” part and the “network” part, but there are a lot more algorithms that go into recent work. For example, Boltzmann machines are named as if they come from physics, but supposedly they were invented by a neuroscientist because they can be trained in a local way that is biologically realistic. (Except I think it’s only RBMs that have that property, so the neuroscientist failed in the short term, or the story is complete nonsense.) Markov random fields did come out of physics and maybe they could have lead to everything else.
One thing I’ve been wondering about deep neural networks: to what extent are neural networks novel and non-obvious? To what extent has evolution invented and thus taught us something very important to know for AI? (I realize this counterfactual is hard to evaluate.)
That is, imagine a world like ours but in which for some reason, no one had ever been sufficiently interested in neurons & the brain as to make the basic findings about neural network architecture and its power like Pitts & McCulloch. Would anyone reinvent them or any isomorphic algorithm or discover superior statistical/machine-learning methods?
For example, Ilya comments elsewhere that he doesn’t think much of neural networks inasmuch as they’re relatively simple, ‘just’ a bunch of logistic regressions wired together in layers and adjusted to reduce error. True enough—for all the subleties, even a big ImageNet-winning neural network is not that complex to implement; you don’t have to be a genius to create some neural nets.
Yet, offhand, I’m having a hard time thinking of any non-neural network algorithms which operate like a neural network in putting together a lot of little things in layers and achieving high performance. That’s not like any of your usual regressions or tests, multi-level models aren’t very close, random forests and bagging and factor analysis may be universal or consistent but are ‘flat’...
Nor do I see many instances of people proposing new methods which turn out to just be a convolutional network with nodes and hidden layers renamed. (A contrast here would be Turing’s halting theorem: it seems like you can’t throw a stick among language or system papers without hitting a system complicated enough to be Turing-complete and hence indecidable, and like there were a small cottage industry post-Turing of showing that yet another system could be turned into a Turing machine or a result could be interpreted as proving something well-known about Turing machines.) There don’t seem to be ‘multiple inventions’ here, as if the paradigm were non-obvious and, without the biological inspiration.
So if humanity had had no biological neural networks to steal the general idea and as proof of feasibility, would machine learning & AI be far behind where they are now?
This 2007 talk by Yann LeCun, Who is Afraid of Non-Convex Loss Functions? seems very relevant to your question. I’m far from an ML expert, but here’s my understanding from that talk and various other sources. Basically there is no theoretical reason to think that deep neural nets can be trained for any interesting AI task, because they are not convex so there’s no guarantee that when you try to optimize the weights you won’t get stuck in local minima or flat spots. People tried to use DNNs anyway and suffered from those problems in practice as well, so the field almost gave it up entirely and limited itself to convex methods (such as SVM and logistic regression) which don’t have these optimization problems but do have other limitations. It eventually turned out that if you apply various tricks, good enough local optima can be found for DNNs for certain types of AI problems. (Far from “you don’t have to be a genius to create some neural nets”, those tricks weren’t easy to find otherwise it wouldn’t have taken so long!)
Without biological neural networks as inspiration and proof of feasibility, I guess people probably still would have had the idea to put things in layers and try to reduce error, but would have given up more completely when they hit the optimization problems, and nobody would have found those tricks until much later when they exhausted other approaches and came back to deep nets.
I don’t think we would be that far behind.
NNs had lost favor in the AI community after 1969 (minsky’s paper) and only have become popular again in the last decade. see http://en.wikipedia.org/wiki/Artificial_neural_network
The only crossover that comes to mind for me is the vision deep learning ‘discovering’ edge detection. There also is some interest in sparse NN activation.
Yes, I’m familiar with the history. But how far would we be without the neural network work done since ~2001? The non-neural-network competitors on Imagenet like SVM are nowhere near human levels of performance, Watson required neural networks, Stanley won the DARPA Grand Challenge without neural networks because it had so many sensors but real self-driving cars will have to use neural networks, neural networks are why Google Translate has gone from roughly Babelfish levels (hysterically bad) to remarkably good, voice recognition has gone from mostly hypothetical to routine on smartphones...
What major AI achievements have SVMs or random forests racked up over the past decade comparable to any of that?
NNs connection to biology is very thin. Artificial neurons don’t look or act like regular neurons at all. But as a coined term to sell your research idea its great.
NNs are popular now for their deep learning properties and ability to learn features from unlabeled data (like edge detection).
Comparing NNs to SVMs isn’t really fair. You use the tool best for the job. If you have lots of labeled data you are more likely to use an SVM. It just depends on what problem you are being asked so solve. And of course you might feed an NNs output into an SVM or vice versa.
As for major achievements—NNs are leading for now because 1) most of the world’s data is unlabeled and 2) automated feature discovery (deep learning) is better then paying people to craft features.
I am well aware of that. Nevertheless, as a historical fact, they were inspired by real neurons, they do operate more like real neurons than do, say, SVMs or random forests, and this is the background to my original question.
ImageNet is a lot of labeled data, to give one example.
There is a difference between explaining, and explaining away. You seem to think you are doing the latter, while you’re really just doing the former.
SVMs are O(n^3) - if you have lots of data you shouldn’t use SVMs.
What year do you put the change in google translate? It didn’t switch to neural nets until 2012, right? Did anyone notice the change? My memory is that it was dramatically better than babelfish in 2007, let alone 2010.
Good question… I know that Google Translate began as a pretty bad outsourced translator (SYSTRAN) because I had a lot of trouble figuring out when Translate first came out for my Google survival analysis, and it began being upgraded and expanded almost constantly from ~2002 onwards. The 2007 switch was supposedly from the company SYSTRAN to an internal system, but what does that mean? SYSTRAN is a proprietary company which could be using anything it wants internally, and admits it’s a hybrid system. The 2006 beta just calls it statistics and machine learning, with no details about what this means. Google Scholar’s no help here either—hits are swamped by research papers mentioning Translate, and a few more recent hits about the neural networks used in various recent Google mobile-oriented services like speech or image recognition.
So… I have no idea. Highly unlikely to predate their internal translator in 2006, anyway, but could be your 2012 date.
Here is a 2007 paper that I found when I was writing the above. I don’t remember how I found it, or why I think it representative, though.
How many components go into “neural nets”?
At the very least, there are networks of artificial neurons. You seem to accept Ilya’s dismissal of the artificial neuron as too simple to credit, but take the networks as the biologically inspired part. I view those components exactly opposite.
Networks of simple components come up everywhere. There were circuits of electrical components a century ago. A parsed computer program is a network of simple components. Many people doing genetic programming (inspired by biology, but not neurology) work with such trees or networks. Selfridge’s Pandemonium (1958) advocated features built of features, but I think it was inspired by introspective psychology, not neuroscience.
Whereas the common artificial neuron seems crazy to me. It doesn’t matter how simple it is, if it is unmotivated. What seems crazy to me is the biologically inspired idea of a discrete output. Why have a threshold or probabilistic firing in the middle of the network? Of course, you want something like that at the very end of a discrimination task, so maybe you’d think of recycling it into the middle, but not me. I have heard it described as a kind of regularization, so maybe people would have come up with it by thinking about regularization. Or maybe it could be replaced with other regularizations. And a lot of methods have been adapted to real outputs, so maybe the discrete outputs didn’t matter.
So that’s the “neural” part and the “network” part, but there are a lot more algorithms that go into recent work. For example, Boltzmann machines are named as if they come from physics, but supposedly they were invented by a neuroscientist because they can be trained in a local way that is biologically realistic. (Except I think it’s only RBMs that have that property, so the neuroscientist failed in the short term, or the story is complete nonsense.) Markov random fields did come out of physics and maybe they could have lead to everything else.