IlyaShpitser comments on Open thread, Nov. 16 - Nov. 22, 2015

IlyaShpitser 16 Nov 2015 18:18 UTC
13 points
ML is search. If you have more parameters, you can do more, but the search problem is harder. Deep NN is a way to parallelize the search problem with # of grad students (by tweaks, etc.), also a general template to guide local-search-via-gradient (e.g. make it look for “interesting” features in the data).

I don’t mean to be disparaging, btw. I think it is an important innovation to use human AND computer time intelligently to solve bigger problems.

In some sense it is voodoo (not very interpretable) but so what? Lots of other solutions to problems are, too. Do you really understand how your computer hardware or your OS work? So what if you don’t?
- ZankerH 17 Nov 2015 8:57 UTC
  6 points
  Parent
  
  In some sense it is voodoo (not very interpretable)
  
  There is research in that direction, particularly in the field of visual object recognising convolutional networks. It is possible to interpret what a neural net is looking for.
  
  http://yosinski.com/deepvis
- cousin_it 16 Nov 2015 18:24 UTC
  3 points
  Parent
  I guess the difference is that an RNN might not be understandable even by the person who created and trained it.
  - Lumifer 17 Nov 2015 18:10 UTC
    2 points
    Parent
    There is an interesting angle to this—I think it maps to the difference between (traditional) statistics and data science.
    
    In traditional stats you are used to small, parsimonious models. In these small models each coefficient, each part of the model is separable in a way, it is meaningful and interpretable by itself. The big thing to avoid is overfitting.
    
    In data science (and/or ML) a lot of models are of the sprawling black-box kind where coefficients are not separable and make no sense outside of the context of the whole model. These models aren’t traditionally parsimonious either. Also, because many usual metrics scale badly to large datasets, overfitting has to be managed differently.
    - bogus 17 Nov 2015 18:46 UTC
      −1 points
      Parent
      
      In traditional stats you are used to small, parsimonious models. In these small models each coefficient, each part of the model is separable in a way, it is meaningful and interpretable by itself. The big thing to avoid is overfitting.In traditional stats you are used to small, parsimonious models. In these small models each coefficient, each part of the model is separable in a way, it is meaningful and interpretable by itself. The big thing to avoid is overfitting.
      
      Keep in mind that traditional stats also includes semi-parametric and non-parametric methods. These give you models which basically manage overfitting by making complexity scale with the amount of data, i.e. they’re by no means “small” or “parsimonious” in the general case. And yes, they’re more similar to the ML stuff but you still get a lot more guarantees.
      
      Also, because many usual metrics scale badly to large datasets, overfitting has to be managed differently.
      
      I get the impression that ML folks have to be way more careful about overfitting because their methods are not going to find the ‘best’ fit—they’re heavily non-deterministic. This means that an overfitted model has basically no real chance of successfully extrapolating from the training set. This is a problem that traditional stats doesn’t have—in that case, your model will still be optimal in some appropriate sense, no matter how low your measures of fit are.
      - IlyaShpitser 17 Nov 2015 21:53 UTC
        1 point
        Parent
        I think I am giving up on correcting “google/wikipedia experts,” it’s just a waste of time, and a losing battle anyways. (I mean the GP here).
        
        I get the impression that ML folks have to be way more careful about overfitting because their methods are not going to find the ‘best’ fit—they’re heavily non-deterministic. This means that an overfitted model has basically no real chance of successfully extrapolating from the training set. This is a problem that traditional stats doesn’t have—in that case, your model will still be optimal in some appropriate sense, no matter how low your measures of fit are.
        
        That said, this does not make sense to me. Bias variance tradeoffs are fundamental everywhere.
  - IlyaShpitser 16 Nov 2015 18:33 UTC
    2 points
    Parent
    I don’t think any one person understands the Linux kernel anymore. It’s just too big. Same with modern CPUs.
    - cousin_it 17 Nov 2015 15:17 UTC
      5 points
      Parent
      An RNN is something that one person can create and then fail to understand. That’s not like the Linux kernel at all.
      - jacob_cannell 17 Nov 2015 17:32 UTC
        3 points
        Parent
        Correction: An RNN is something that a person working with a powerful general optimizer can create and then fail to understand.
        
        A human without the optimizer can create RNNs by hand—but only of the small and simple variety.
    - solipsist 17 Nov 2015 13:13 UTC
      5 points
      Parent
      Although the Linux kernel and modern CPUs are piecewise-understandable, whereas neural networks are not.
      - IlyaShpitser 17 Nov 2015 14:56 UTC
        7 points
        Parent
        Lots of neural networks at an individual vertex level are a logistic regression model, or something similar, -- I think I understand those pretty well. Similarly: “I think I understand 16-bit adders pretty well.”