IlyaShpitser comments on Open Thread, Jun. 29 - Jul. 5, 2015

IlyaShpitser 30 Jun 2015 8:25 UTC
2 points
The relationship between F and B is not like the relationship between Aristotelian physics and relativity. Not at all.
- MrMind 30 Jun 2015 9:07 UTC
  0 points
  Parent
  I’m very tempted to argue that it is!
  But what I wanted to convey is that it feels like I’m supposed to learn something which is manifestly inferior, in its logical foundation, than what is already known and available.
  
  And maybe under the constraint of computational cost the finishing point of the Bayesian and the frequentist approach is the same, but where’s the proof? Where’s the place where someone says: “This is Bayesian machine learning, but it’s computationally too costly. So by making this and this simplifying assumptions, we end up with frequentist machine learning.”?
  
  Instead, what I read are things like: “In practice, Bayesian optimization has been shown to obtain better results in fewer experiments than grid search and random search” (from here).
  - jsteinhardt 30 Jun 2015 15:46 UTC
    6 points
    Parent
    I would urge you to follow ChristianKI’s advice, since I suspect you probably know much less than you think you know about either Bayesian or frequentist statistics. Perhaps you could explain in your own words why exactly it is clear that the ML book you are reading is “manifestly inferior” to your preferred approach?
    
    Also consider reading this: A Fervent Defense of Frequentist Statistics.
    - MrMind 1 Jul 2015 9:30 UTC
      0 points
      Parent
      
      Perhaps you could explain in your own words why exactly it is clear that the ML book you are reading is “manifestly inferior” to your preferred approach?
      
      There is a bit of confusion here. I’m not stating that frequentist machine learning is inferior to Bayesian machine learning. I’m stating that Bayesian probability is superior to frequentist probability.
      How do I say this? Because in all the case that I know, either a Bayesian model can be reduced to a frequentist one or a Bayesian model gives more accurate prediction.
      
      That said, not even this is a problem. Since I’m learning the subject, I’m not at the stage of saying “this sentence is wrong”. I’m at the stage of “this sentence doesn’t make sense in the context of Bayesianism”. So I’m asking “is there a book that teaches ML from a Bayesian point of view?”.
      The answer I’m discovering, appallingly but maybe not so, is no.
      
      As for the fervent defence, under the premises elucidated in the comments, I hold none of the myths, so it doesn’t apply.
      - Vaniver 1 Jul 2015 13:27 UTC
        7 points
        Parent
        
        Because in all the case that I know, either a Bayesian model can be reduced to a frequentist one or a Bayesian model gives more accurate prediction.
        
        I typically see this stated as “there is a Bayesian interpretation for every effective statistical technique.” As pointed out elsewhere, typically people use “frequentist” to mean “non-Bayesian,” which is not particularly effective as a classification.
        
        So I’m asking “is there a book that teaches ML from a Bayesian point of view?”.
        
        The answer I’m discovering, appallingly but maybe not so, is no.
        
        Did you google Bayesian Machine Learning, or search for it on Amazon? Barber is a well-rated textbook available online for free. (I haven’t read it; Sebastien Bratieres thinks it’s comparable to Murphy, the second most popular ML book, which is Bayesian.) Incidentally, Bishop, the most popular ML book, is also Bayesian. You managed to find the only ML textbook I’ve seen which has, as a comment in one of the Amazon reviews, a positive comment that the book is not Bayesian!
        
        The more meta point here is to not let a worldview shut you out from potentially useful resources. Yes, Bayesianism is the best philosophy of probability, but that does not mean it is the most effective practice of statistics, and excluding concepts or practices from your knowledge of statistics because of a disagreement on philosophy is parochial and self-limiting.
        MrMind 2 Jul 2015 8:49 UTC
        0 points
        Parent
        
        As pointed out elsewhere, typically people use “frequentist” to mean “non-Bayesian,” which is not particularly effective as a classification.
        
        Reducing a frequentist model to a Bayesian one though it’s not a pointless excercise, since it elucidates the hidden assumptions, and at least you are better aware of its field of applicability.
        
        Did you google Bayesian Machine Learning, or search for it on Amazon?
        
        Only after buying the book I have :/ Bishop though seems a lot interesting, thanks!
        
        The more meta point here is to not let a worldview shut you out from potentially useful resources.
        
        Thankfully, I’m learning ML for my own education, it’s not something I need to practice right now.
        Vaniver 2 Jul 2015 13:50 UTC
        1 point
        Parent
        
        Bishop though seems a lot interesting, thanks!
        
        You’re welcome! I should point out that the other words I was considering using to describe Bishop are “classic” and “venerable”—it’s not out of date (most actively used ML methods are surprisingly old), but you may want to read it in parallel with Barber. (In general, if you’ve never read textbooks in parallel before, I recommend it as a lesson in textbook design / pedagogy.)
        IlyaShpitser 2 Jul 2015 14:42 UTC
        3 points
        Parent
        Using Bishop in my class this Fall, very popular for good reason.
  - ChristianKl 30 Jun 2015 11:05 UTC
    6 points
    Parent
    
    But what I wanted to convey is that it feels like I’m supposed to learn something which is manifestly inferior, in its logical foundation, than what is already known and available.
    
    I think it’s very useful to listen to be able to listen to someone with domain expertise telling you when you are wrong when you are a beginner.
    - MrMind 1 Jul 2015 9:36 UTC
      0 points
      Parent
      But then I’m allowed to ask “why?”, and if the answer is “because I say so”, then I feel pretty confident to dismiss the expert.
      
      But that’s not even the stage I’m at. A book is not an interactive medium, so the act has gone like this:
      
      book: Cross-validation!
      me: “Gaaaak! That sounds like totally wrong! Is there anyone that can explain me either why this is right or, if it’s actually wrong, what is the correct approach?”
      
      I’m still searching for an answer...
      - Wei Dai 1 Jul 2015 23:25 UTC
        6 points
        Parent
        
        I’m still searching for an answer...
        
        Try this paper or page 403 of this textbook.
        
        Also, although in this case there seems to be an available answer, I don’t think it makes sense to always expect that. Sometimes people find a technique that tends to work in practice and then only later come up with a theoretical explanation of why it works. If you happen to live in the period in between...
        MrMind 2 Jul 2015 8:36 UTC
        0 points
        Parent
        
        If you happen to live in the period in between...
        
        He! I’ve suddenly remembered that LW was founded exactly because the fields of AI and ML used too much frequentist (il)logic. The Sequence was about to restore sanity in the field.
        Anyway, the textbook you mentioned seems pretty cool, thank you very much!
      - ChristianKl 1 Jul 2015 22:19 UTC
        2 points
        Parent
        I’m no expert at machine learning. However as far as I remember the point of doing cross-validation is to find out whether your model is robust. Robustness is not a standard “Bayesian” concept. Maybe you don’t appreciate it’s value?
        MrMind 2 Jul 2015 8:39 UTC
        0 points
        Parent
        I would appreciate if there was en explanation of why something is done the way it is. Instead it’s all about learning the passwords. Maybe it’s just that the main textbook in the field is pedagogically bad, it wouldn’t be the first time.
        ChristianKl 2 Jul 2015 12:07 UTC
        0 points
        Parent
        Getting deep understanding of a complex field like machine intelligence isn’t easy. You shouldn’t expect it to be easy and something that you can acquire in a few days.
    - Viliam 30 Jun 2015 21:26 UTC
      0 points
      Parent
      This is probably very arrogant of me to say, but my advice would be: “Listen to the domain expert when he tells you what you should do… and then find a Bayesian and let them explain to you why that works.”
      
      In my defense, this was my personal experience with statistics at school. I was very good at math in general, but statistics somehow didn’t “click”. I always had this feeling as if what was explained was built on some implicit assumptions that no one ever mentioned explicitly, so unlike with the rest of the math, I had no other choice here but to memorize that in a situation x you should do y, because, uhm, that’s what my teachers told me to do. -- More than ten years later, I read LW, and here I am told that yes, the statistics that I was taught does have implicit assumptions, and suddenly it all makes sense. And it makes me very angry that no one told me this stuff at school. -- I am a “deep learner” (this, not this), and I have problem learning something when I am told how, but I can’t find out why. Most people probably don’t have a problem with this, they are told how, and they do, and can be quite successful with it; and probably later they will also get an idea of why. But I need to understand the stuff from the very beginning, otherwise I can’t do it well. Telling me to trust a domain expert does not help; I may put a big confidence in how, but I still don’t know why.
      - jsteinhardt 30 Jun 2015 22:24 UTC
        4 points
        Parent
        ChristianKI is not telling you to trust a domain expert, but rather to read / listen to the domain expert long enough to understand what they are saying (rather than instantly assuming they are wrong because they say something that seems to conflict with your preconceived notions).
        
        I think if you were to read most machine learning books, you would get quite a lot of “why”. See this manuscript for instance. I don’t really see why you think that Bayesians have a monopoly on being able to explain things.
      - ChristianKl 30 Jun 2015 23:02 UTC
        0 points
        Parent
        I think you make a mistake if you put a school teacher who doesn’t understand statistics on a deep level into the same category of academic machine learning experts who don’t happen to be “Bayesians”.
  - IlyaShpitser 30 Jun 2015 11:20 UTC
    4 points
    Parent
    
    I’m very tempted to argue that it is!
    
    Ok, thank you for your time.
  - jacob_cannell 2 Jul 2015 0:28 UTC
    2 points
    Parent
    There is the probabilistic programming community which uses clean tools (programming languages) to hand construct models with many unknown parameters. They use approximate bayesian methods for inference, and they are slowly improving the efficiency/scalability of those techniques.
    
    Then there is the neural net & optimization community which uses general automated models. It is more ‘frequentist’ (or perhaps just ad-hoc ), but there are also now some bayesian inroads there. That community has the most efficient/scalable learning methods, but it isn’t always clear what tradeoffs they are making.
    
    And even in the ANN world, you sometimes see bayesian statistics brought in to justify regularizers or to derive stuff—such as in variational methods. But then for actual learning they take gradients and use SGD, with the understanding that SGD is somehow approximating the bayesian inference step, or at least doing something close enough.