I’m very tempted to argue that it is! But what I wanted to convey is that it feels like I’m supposed to learn something which is manifestly inferior, in its logical foundation, than what is already known and available.
And maybe under the constraint of computational cost the finishing point of the Bayesian and the frequentist approach is the same, but where’s the proof?
Where’s the place where someone says: “This is Bayesian machine learning, but it’s computationally too costly. So by making this and this simplifying assumptions, we end up with frequentist machine learning.”?
Instead, what I read are things like: “In practice, Bayesian optimization has been shown to obtain better results in fewer experiments than grid search and random search” (from here).
I would urge you to follow ChristianKI’s advice, since I suspect you probably know much less than you think you know about either Bayesian or frequentist statistics. Perhaps you could explain in your own words why exactly it is clear that the ML book you are reading is “manifestly inferior” to your preferred approach?
Perhaps you could explain in your own words why exactly it is clear that the ML book you are reading is “manifestly inferior” to your preferred approach?
There is a bit of confusion here. I’m not stating that frequentist machine learning is inferior to Bayesian machine learning. I’m stating that Bayesian probability is superior to frequentist probability. How do I say this? Because in all the case that I know, either a Bayesian model can be reduced to a frequentist one or a Bayesian model gives more accurate prediction.
That said, not even this is a problem. Since I’m learning the subject, I’m not at the stage of saying “this sentence is wrong”. I’m at the stage of “this sentence doesn’t make sense in the context of Bayesianism”. So I’m asking “is there a book that teaches ML from a Bayesian point of view?”. The answer I’m discovering, appallingly but maybe not so, is no.
As for the fervent defence, under the premises elucidated in the comments, I hold none of the myths, so it doesn’t apply.
Because in all the case that I know, either a Bayesian model can be reduced to a frequentist one or a Bayesian model gives more accurate prediction.
I typically see this stated as “there is a Bayesian interpretation for every effective statistical technique.” As pointed out elsewhere, typically people use “frequentist” to mean “non-Bayesian,” which is not particularly effective as a classification.
So I’m asking “is there a book that teaches ML from a Bayesian point of view?”.
The answer I’m discovering, appallingly but maybe not so, is no.
Did you google Bayesian Machine Learning, or search for it on Amazon? Barber is a well-rated textbook available online for free. (I haven’t read it; Sebastien Bratieres thinks it’s comparable to Murphy, the second most popular ML book, which is Bayesian.) Incidentally, Bishop, the most popular ML book, is also Bayesian. You managed to find the only ML textbook I’ve seen which has, as a comment in one of the Amazon reviews, a positive comment that the book is not Bayesian!
The more meta point here is to not let a worldview shut you out from potentially useful resources. Yes, Bayesianism is the best philosophy of probability, but that does not mean it is the most effective practice of statistics, and excluding concepts or practices from your knowledge of statistics because of a disagreement on philosophy is parochial and self-limiting.
As pointed out elsewhere, typically people use “frequentist” to mean “non-Bayesian,” which is not particularly effective as a classification.
Reducing a frequentist model to a Bayesian one though it’s not a pointless excercise, since it elucidates the hidden assumptions, and at least you are better aware of its field of applicability.
Did you google Bayesian Machine Learning, or search for it on Amazon?
Only after buying the book I have :/
Bishop though seems a lot interesting, thanks!
The more meta point here is to not let a worldview shut you out from potentially useful resources.
Thankfully, I’m learning ML for my own education, it’s not something I need to practice right now.
You’re welcome! I should point out that the other words I was considering using to describe Bishop are “classic” and “venerable”—it’s not out of date (most actively used ML methods are surprisingly old), but you may want to read it in parallel with Barber. (In general, if you’ve never read textbooks in parallel before, I recommend it as a lesson in textbook design / pedagogy.)
But what I wanted to convey is that it feels like I’m supposed to learn something which is manifestly inferior, in its logical foundation, than what is already known and available.
I think it’s very useful to listen to be able to listen to someone with domain expertise telling you when you are wrong when you are a beginner.
But then I’m allowed to ask “why?”, and if the answer is “because I say so”, then I feel pretty confident to dismiss the expert.
But that’s not even the stage I’m at. A book is not an interactive medium, so the act has gone like this:
book: Cross-validation!
me: “Gaaaak! That sounds like totally wrong! Is there anyone that can explain me either why this is right or, if it’s actually wrong, what is the correct approach?”
Also, although in this case there seems to be an available answer, I don’t think it makes sense to always expect that. Sometimes people find a technique that tends to work in practice and then only later come up with a theoretical explanation of why it works. If you happen to live in the period in between...
He! I’ve suddenly remembered that LW was founded exactly because the fields of AI and ML used too much frequentist (il)logic. The Sequence was about to restore sanity in the field. Anyway, the textbook you mentioned seems pretty cool, thank you very much!
I’m no expert at machine learning. However as far as I remember the point of doing cross-validation is to find out whether your model is robust.
Robustness is not a standard “Bayesian” concept. Maybe you don’t appreciate it’s value?
I would appreciate if there was en explanation of why something is done the way it is. Instead it’s all about learning the passwords. Maybe it’s just that the main textbook in the field is pedagogically bad, it wouldn’t be the first time.
Getting deep understanding of a complex field like machine intelligence isn’t easy. You shouldn’t expect it to be easy and something that you can acquire in a few days.
This is probably very arrogant of me to say, but my advice would be: “Listen to the domain expert when he tells you what you should do… and then find a Bayesian and let them explain to you why that works.”
In my defense, this was my personal experience with statistics at school. I was very good at math in general, but statistics somehow didn’t “click”. I always had this feeling as if what was explained was built on some implicit assumptions that no one ever mentioned explicitly, so unlike with the rest of the math, I had no other choice here but to memorize that in a situation x you should do y, because, uhm, that’s what my teachers told me to do. -- More than ten years later, I read LW, and here I am told that yes, the statistics that I was taught does have implicit assumptions, and suddenly it all makes sense. And it makes me very angry that no one told me this stuff at school. -- I am a “deep learner” (this, not this), and I have problem learning something when I am told how, but I can’t find out why. Most people probably don’t have a problem with this, they are told how, and they do, and can be quite successful with it; and probably later they will also get an idea of why. But I need to understand the stuff from the very beginning, otherwise I can’t do it well. Telling me to trust a domain expert does not help; I may put a big confidence in how, but I still don’t know why.
ChristianKI is not telling you to trust a domain expert, but rather to read / listen to the domain expert long enough to understand what they are saying (rather than instantly assuming they are wrong because they say something that seems to conflict with your preconceived notions).
I think if you were to read most machine learning books, you would get quite a lot of “why”. See this manuscript for instance. I don’t really see why you think that Bayesians have a monopoly on being able to explain things.
I think you make a mistake if you put a school teacher who doesn’t understand statistics on a deep level into the same category of academic machine learning experts who don’t happen to be “Bayesians”.
There is the probabilistic programming community which uses clean tools (programming languages) to hand construct models with many unknown parameters. They use approximate bayesian methods for inference, and they are slowly improving the efficiency/scalability of those techniques.
Then there is the neural net & optimization community which uses general automated models. It is more ‘frequentist’ (or perhaps just ad-hoc ), but there are also now some bayesian inroads there. That community has the most efficient/scalable learning methods, but it isn’t always clear what tradeoffs they are making.
And even in the ANN world, you sometimes see bayesian statistics brought in to justify regularizers or to derive stuff—such as in variational methods. But then for actual learning they take gradients and use SGD, with the understanding that SGD is somehow approximating the bayesian inference step, or at least doing something close enough.
The relationship between F and B is not like the relationship between Aristotelian physics and relativity. Not at all.
I’m very tempted to argue that it is!
But what I wanted to convey is that it feels like I’m supposed to learn something which is manifestly inferior, in its logical foundation, than what is already known and available.
And maybe under the constraint of computational cost the finishing point of the Bayesian and the frequentist approach is the same, but where’s the proof? Where’s the place where someone says: “This is Bayesian machine learning, but it’s computationally too costly. So by making this and this simplifying assumptions, we end up with frequentist machine learning.”?
Instead, what I read are things like: “In practice, Bayesian optimization has been shown to obtain better results in fewer experiments than grid search and random search” (from here).
I would urge you to follow ChristianKI’s advice, since I suspect you probably know much less than you think you know about either Bayesian or frequentist statistics. Perhaps you could explain in your own words why exactly it is clear that the ML book you are reading is “manifestly inferior” to your preferred approach?
Also consider reading this: A Fervent Defense of Frequentist Statistics.
There is a bit of confusion here. I’m not stating that frequentist machine learning is inferior to Bayesian machine learning. I’m stating that Bayesian probability is superior to frequentist probability.
How do I say this? Because in all the case that I know, either a Bayesian model can be reduced to a frequentist one or a Bayesian model gives more accurate prediction.
That said, not even this is a problem. Since I’m learning the subject, I’m not at the stage of saying “this sentence is wrong”. I’m at the stage of “this sentence doesn’t make sense in the context of Bayesianism”. So I’m asking “is there a book that teaches ML from a Bayesian point of view?”.
The answer I’m discovering, appallingly but maybe not so, is no.
As for the fervent defence, under the premises elucidated in the comments, I hold none of the myths, so it doesn’t apply.
I typically see this stated as “there is a Bayesian interpretation for every effective statistical technique.” As pointed out elsewhere, typically people use “frequentist” to mean “non-Bayesian,” which is not particularly effective as a classification.
Did you google Bayesian Machine Learning, or search for it on Amazon? Barber is a well-rated textbook available online for free. (I haven’t read it; Sebastien Bratieres thinks it’s comparable to Murphy, the second most popular ML book, which is Bayesian.) Incidentally, Bishop, the most popular ML book, is also Bayesian. You managed to find the only ML textbook I’ve seen which has, as a comment in one of the Amazon reviews, a positive comment that the book is not Bayesian!
The more meta point here is to not let a worldview shut you out from potentially useful resources. Yes, Bayesianism is the best philosophy of probability, but that does not mean it is the most effective practice of statistics, and excluding concepts or practices from your knowledge of statistics because of a disagreement on philosophy is parochial and self-limiting.
Reducing a frequentist model to a Bayesian one though it’s not a pointless excercise, since it elucidates the hidden assumptions, and at least you are better aware of its field of applicability.
Only after buying the book I have :/ Bishop though seems a lot interesting, thanks!
Thankfully, I’m learning ML for my own education, it’s not something I need to practice right now.
You’re welcome! I should point out that the other words I was considering using to describe Bishop are “classic” and “venerable”—it’s not out of date (most actively used ML methods are surprisingly old), but you may want to read it in parallel with Barber. (In general, if you’ve never read textbooks in parallel before, I recommend it as a lesson in textbook design / pedagogy.)
Using Bishop in my class this Fall, very popular for good reason.
I think it’s very useful to listen to be able to listen to someone with domain expertise telling you when you are wrong when you are a beginner.
But then I’m allowed to ask “why?”, and if the answer is “because I say so”, then I feel pretty confident to dismiss the expert.
But that’s not even the stage I’m at. A book is not an interactive medium, so the act has gone like this:
book: Cross-validation!
me: “Gaaaak! That sounds like totally wrong! Is there anyone that can explain me either why this is right or, if it’s actually wrong, what is the correct approach?”
I’m still searching for an answer...
Try this paper or page 403 of this textbook.
Also, although in this case there seems to be an available answer, I don’t think it makes sense to always expect that. Sometimes people find a technique that tends to work in practice and then only later come up with a theoretical explanation of why it works. If you happen to live in the period in between...
He! I’ve suddenly remembered that LW was founded exactly because the fields of AI and ML used too much frequentist (il)logic. The Sequence was about to restore sanity in the field.
Anyway, the textbook you mentioned seems pretty cool, thank you very much!
I’m no expert at machine learning. However as far as I remember the point of doing cross-validation is to find out whether your model is robust. Robustness is not a standard “Bayesian” concept. Maybe you don’t appreciate it’s value?
I would appreciate if there was en explanation of why something is done the way it is. Instead it’s all about learning the passwords. Maybe it’s just that the main textbook in the field is pedagogically bad, it wouldn’t be the first time.
Getting deep understanding of a complex field like machine intelligence isn’t easy. You shouldn’t expect it to be easy and something that you can acquire in a few days.
This is probably very arrogant of me to say, but my advice would be: “Listen to the domain expert when he tells you what you should do… and then find a Bayesian and let them explain to you why that works.”
In my defense, this was my personal experience with statistics at school. I was very good at math in general, but statistics somehow didn’t “click”. I always had this feeling as if what was explained was built on some implicit assumptions that no one ever mentioned explicitly, so unlike with the rest of the math, I had no other choice here but to memorize that in a situation x you should do y, because, uhm, that’s what my teachers told me to do. -- More than ten years later, I read LW, and here I am told that yes, the statistics that I was taught does have implicit assumptions, and suddenly it all makes sense. And it makes me very angry that no one told me this stuff at school. -- I am a “deep learner” (this, not this), and I have problem learning something when I am told how, but I can’t find out why. Most people probably don’t have a problem with this, they are told how, and they do, and can be quite successful with it; and probably later they will also get an idea of why. But I need to understand the stuff from the very beginning, otherwise I can’t do it well. Telling me to trust a domain expert does not help; I may put a big confidence in how, but I still don’t know why.
ChristianKI is not telling you to trust a domain expert, but rather to read / listen to the domain expert long enough to understand what they are saying (rather than instantly assuming they are wrong because they say something that seems to conflict with your preconceived notions).
I think if you were to read most machine learning books, you would get quite a lot of “why”. See this manuscript for instance. I don’t really see why you think that Bayesians have a monopoly on being able to explain things.
I think you make a mistake if you put a school teacher who doesn’t understand statistics on a deep level into the same category of academic machine learning experts who don’t happen to be “Bayesians”.
Ok, thank you for your time.
There is the probabilistic programming community which uses clean tools (programming languages) to hand construct models with many unknown parameters. They use approximate bayesian methods for inference, and they are slowly improving the efficiency/scalability of those techniques.
Then there is the neural net & optimization community which uses general automated models. It is more ‘frequentist’ (or perhaps just ad-hoc ), but there are also now some bayesian inroads there. That community has the most efficient/scalable learning methods, but it isn’t always clear what tradeoffs they are making.
And even in the ANN world, you sometimes see bayesian statistics brought in to justify regularizers or to derive stuff—such as in variational methods. But then for actual learning they take gradients and use SGD, with the understanding that SGD is somehow approximating the bayesian inference step, or at least doing something close enough.