Wei Dai comments on Open thread, Mar. 23 - Mar. 31, 2015

Wei Dai 29 Jun 2015 11:14 UTC
3 points
This 2007 talk by Yann LeCun, Who is Afraid of Non-Convex Loss Functions? seems very relevant to your question. I’m far from an ML expert, but here’s my understanding from that talk and various other sources. Basically there is no theoretical reason to think that deep neural nets can be trained for any interesting AI task, because they are not convex so there’s no guarantee that when you try to optimize the weights you won’t get stuck in local minima or flat spots. People tried to use DNNs anyway and suffered from those problems in practice as well, so the field almost gave it up entirely and limited itself to convex methods (such as SVM and logistic regression) which don’t have these optimization problems but do have other limitations. It eventually turned out that if you apply various tricks, good enough local optima can be found for DNNs for certain types of AI problems. (Far from “you don’t have to be a genius to create some neural nets”, those tricks weren’t easy to find otherwise it wouldn’t have taken so long!)

Without biological neural networks as inspiration and proof of feasibility, I guess people probably still would have had the idea to put things in layers and try to reduce error, but would have given up more completely when they hit the optimization problems, and nobody would have found those tricks until much later when they exhausted other approaches and came back to deep nets.