I have to say, I seriously don’t get the Bayesian vs Frequentist holy wars. It seems to me the ratio of importance to education of its participants is ridiculously low.
Bayesian and frequentist methods are sets of statistical tools, not sacred orders to which you pledge a blood oath. Just understand the usage of each tools, and the fact that virtually any model of something that happens in the real world is going to be misspecified.
I have to say, I seriously don’t get the Bayesian vs Frequentist holy wars.
This is a bit of an exaggeration.
Additionally, you are only talking about the ‘sets of statistical tools’, where in my experience the bigger disagreement often lies in whether a person accepts that probabilities can be subjective or not; And yes—this does matter.
‘From my point of view the probability for X is Y, but from his point of view at the time it would’ve been Z’. (subjective) vs ‘The Probability for X is Y’ (‘objective’).
Honestly though, frequentists use subjective probabilities all the time and you can argue that frequentism is just as subjective as bayesinism, so even that disagreement is quite muddy.
Part of it is that Bayesianism claims to be not just a better statistical tool, but a new and better epistemology, a replacement and improvement over Aristotelian logic.
There are a bunch of issues involved. It hard to speak about them because the term Bayesianism is encompasses a wide array of ideas and everytime it’s used it might refer to a different subset of that cluster of ideas.
Part of LW is that it’s a place to discuss how an AGI could be structured. As such we care about the philosophic level of how you come to know that something is true. As such there an interest into going as basic as possible when looking at epistemology. There are issues about objective knowledge versus “subjective” Bayesian priors that are worth thinking about.
We live at a time where up to 70% of scientific research can’t be replicated. Frequentism might not be to blame for all of that, but it does play it’s part. There are issues such an the Bem paper about porno-precognition where frequentist techniques did suggest that porno-precognition is real but analysing Bems data with Bayesian methods suggested it’s not.
There are further issues that a lot of additional assumptions are loaded into the word Bayesianism if you use that word on LessWrong. What Bayesianism taught me speaks about a bunch of issues that only have indirectly something to do with Bayesian tools vs. Frequentist tools.
Let’s say I want to decide how much salt I should eat. I do follow the consensus that salt is bad and therefore have some prior that salt is bad. Then a new study comes along and says that low salt diets are unhealthy. If I want to make good decisions I have to ask: How much should I update? There no good formal way for making such decisions. We lack a good framework for doing this. Bayes rule is the answer to that problem that provides the promise of a solution.
The solution to wait a few years and then read a meta review is unsatisfying.
In the absence of a formal way to do the reasoning, many people do use informal ways of updating towards new evidence. Cognitive bias research suggest that the average person isn’t good at this.
Just understand the usage of each tools, and the fact that virtually any model of something that happens in the real world is going to be misspecified.
That sentence is quite easy to say but it effectively means there no such thing as pure absolute objective truth. If you use tools A you get truth X and if you use tools B you get truth Y. Neither X or Y are “more true”. That’s not an appealing conclusion to many people.
Full disclosure: I have papers using B (on structure learning using BIC, which is an approximation to a posterior of a graphical model), and using F (on estimation of causal effects). I have no horse in this race.
Bayes rule is the answer to that problem that provides the promise of a solution.
See, this is precisely the kind of stuff that makes me shudder, that regularly appears on LW, in an endless stream. While Scott Alexander is busy bible thumping data analysts on his blog, people here say stuff like this.
Bayes rule doesn’t provide shit. Bayes rule just says that p(A | B) p(B) = p(B | A) p(A).
Here’s what you actually need to make use of info in this study:
(a) Read the study.
(b) See if they are actually making a causal claim.
(c) See if they are using experimental or observational data.
(d) Experimental? Do we believe the setup? Are we in a similar cohort? What about experimental design issues? Observational? Do they know what they are doing, re: causality-from-observational-data? Is their model that permits this airtight (usually it is not, see Scott’s post on “adjusting for confounders”. Generally to really believe that adjusting for confounders is reasonable you need a case where you know all confounders are recorded by definition of the study, for instance if doctors prescribe medicine based only on recorded info in the patient file).
(e) etc etc etc
I mean what exactly did you expert, a free lunch? Getting causal info and using it is hard.
p.s. If you skeptical about statistics papers that adjust for confounders, you should also be skeptical about missing data papers that assume MAR (missing at random). It is literally the same assumption.
I mean what exactly did you expert, a free lunch? Getting causal info and using it is hard.
You miss the point. When it comes to interviewing candidates for job then we found out that unstructured human assessment doesn’t happen that good.
It could very well be that the standard unstructured way of reading papers is not optimal and that we should have Bayesian beliefs nets in which we plug numbers such as whether the experiment is experimental or observational.
Whether MetaMed or someone else succeeds at that task and provides a good improvement on the status quo isn’t certain but there are ideas to explore.
Is it clear that MetaMed as group of self professed Bayesians provide a useful service? Maybe, maybe not. On the other hand the philosophy on which MetaMed operates is not the standard philosophy on which the medical establishment operates.
I don’t know how Metamed works (and it’s sort of their secret sauce, so they probably will not tell us without an NDA). I am guessing it is some combination of doing (a) through (e) above for someone who cannot do it themselves, and possibly some B stats. Which seems like a perfectly sensible business model to me!
I don’t think the secret sauce is in the B stats part of what they are doing, though. If we had a hypothetical company called “Freqmed” that also humanwaved (a) through (e), and then used F stats I doubt they would get non-sensible answers. It’s about being sensible, not your identity as a statistician.
I can be F with Bayes nets. Bayes nets are just a conditional independence model.
I don’t know how successful Metamed will be, but I honestly wish them the best of luck. I certainly think there is a lot of crazy out there in data analysis, and it’s a noble thing to try to make money off of making things more sensible.
The thing is, I don’t know about a lot of the things that get talked about on LW. I do know about B and F a little bit, and about causality a little bit. And a huge chunk of stuff people say is just plain wrong. So I tell them it’s wrong, but they keep going and don’t change what they say at all. So how should I update—that folks on this rationalist community generally don’t know what they are talking about and refuse to change?
It’s like wikipedia—the first sentence in the article on confounders is wrong on wikipedia (there is a very simple 3 node example that violates that definition). The talk page on Bayesian networks is a multi-year tale of woe and ignorance. I once got into an edit war with a resident bridge troll for that article, and eventually gave up and left, because he had more time. What does that tell me about wikipedia?
But we don’t. MetaMed did come out of a certain kind of thinking. The project had a motivation.
I do know about B and F a little bit, and about causality a little bit.
Just because you know what the people in the statistic community mean when they say “Bayesian” doesn’t automatically mean that you know what someone on LW means when he says Bayesian.
If you look at the “What Bayesianism taught me”, there a person who changed their beliefs through learning about Bayesianism. Do the points he makes have something to do with Frequentism vs. Bayesianism? Not directly.
On the other hand he did change major beliefs about he thinks about how the world and epistemology.
That means that the term Bayesianism as used in that article isn’t completely empty.
It’s about being sensible
Sensiblism might be a fun name for a philosophy. On the first LW meetup where I attended one of the participants had a scooter. My first question was about his traveling speed and how much time he effectively wins by using it. On that question he gave a normal answer.
My second question was over the accident rate of scooters. He replied something along the lines: “I really don’t know, I should research the issue more in depth and get the numbers.” That not the kind of answer normal people give when faced with the question for safety of the mode of travel.
You could say he’s simply sensible while 99% of the population that out there that would answer the question differently isn’t. On the other hand it’s quite difficult to explain to those 99% that they aren’t sensible.
If you prod them a bit they might admit that knowing accident risks is useful for making a decision about one’s mode of travel but they don’t update on a deep level.
Then people like you come and say: “Well of course we should be sensible. There no need to point is about explicitly or to give it a fancy name. Being sensible should go without saying.”
The problem is that in practice it doesn’t go without saying and speaking about it is hard. Calling it Bayesianism might be a very confusing way to speak about it but it seems to be an improvement over having no words at all. Maybe tabooing Bayesianism as word on LW would be the right choice. Maybe the word produces more problems than it solves.
It’s like wikipedia—the first sentence in the article on confounders is wrong on wikipedia.
“In statistics, a confounding variable (also confounding factor, a confound, or confounder) is an extraneous variable in a statistical model that correlates (directly or inversely) with both the dependent variable and the independent variable.” is at the moment that sentence. How would you change the sentence? There no reason why we shouldn’t fix that issue right now.
How would you change the sentence? There no reason why we shouldn’t fix that issue right now.
Counterexamples to a definition (this example is under your definition but is clearly not what we mean by confounder) are easier than a definition. A lot of analytic philosophy is about this. Defining “intuitive terms” is often not as simple as it seems. See, e.g.:
If you think you can make a “sensible” edit based on this paper, I will be grateful if you did so!
re: the rest of your post, words mean things. B is a technical term. I think if you redefine B as internal jargon for LW you will be incomprehensible to stats/ML people, and you don’t want this. Communication across fields is hard enough as it is (“academic coordination problem”), let’s not make it harder by not using standard terminology.
Maybe tabooing Bayesianism as word on LW would be the right choice. Maybe the word produces more
problems than it solves.
I am 100% behind this idea (and in general taboo technical terms unless you really know a lot about it).
It’s about being sensible, not your identity as a statistician.
Speaking of, an interesting paper which distinguishes the Fisher approach to testing from the Neyman-Pearson approach and shows how you can unify/match some of that with Bayesian methods.
We live at a time where up to 70% of scientific research can’t be replicated. Frequentism might not be to blame for all of that, but it does play it’s part. There are issues such an the Bem paper about porno-precognition where frequentist techniques did suggest that porno-precognition is real but analysing Bems data with Bayesian methods suggested it’s not.
It seems to me that there’s a bigger risk from Bayesian methods. They’re more sensitive to small effect sizes (doing a frequentist meta-analysis you’d count a study that got a p=0.1 result as evidence against, doing a bayesian one it might be evidence for). If the prior isn’t swamped then it’s important and we don’t have good best practices for choosing priors; if the prior is swamped then the bayesianism isn’t terribly relevant. And simply having more statistical tools available and giving researchers more choices makes it easier for bias to creep in.
Bayes’ theorem is true (duh) and I’d accept that there are situations where bayesian analysis is more effective than frequentist, but I think it would do more harm than good in formal science.
doing a frequentist meta-analysis you’d count a study that got a p=0.1 result as evidence against
Why would you do that? If I got a p=0.1 result doing a meta-analysis, I wouldn’t be surprised at all since things like random-effects means it takes a lot of data to turn in a positive result at the arbitrary threshold of 0.05. And as it happens, in some areas, an alpha of 0.1 is acceptable: for example, because of the poor power of tests for publication bias, you can find respected people like Ioannides using that particular threshold (I believe I last saw that in his paper on the binomial test for publication bias).
If people really acted that way, we’d see odd phenomenon where people saw successive meta-analysts on whether grapes cure cancer: 0.15 that grapes cure cancer (decreases belief grapes cure cancer), 0.10 (decreases), 0.07 (decreases), someone points out that random-effects is inappropriate because studies show very low heterogeneity and the better fixed-effects analysis suddenly reveals that the true p-value is now at 0.05 (everyone’s beliefs radically flip as they go from ‘grapes have been refuted and are quack alt medicine!’ to ‘grapes cure cancer! quick, let’s apply to the FDA under a fast track’). Instead, we see people acting more like Bayesians...
And simply having more statistical tools available and giving researchers more choices makes it easier for bias to creep in.
Is that a guess, or a fact based on meta-studies showing that Bayesian-using papers cook the books more than NHST users with p-hacking etc?
everyone’s beliefs radically flip as they go from ‘grapes have been refuted and are quack alt medicine!’ to ‘grapes cure cancer! quick, let’s apply to the FDA under a fast track’
Turns out I am overoptimistic and in some cases people have done just that: interpreted a failure to reject the null (due to insufficient power, despite being evidence for an effect) as disproving the alternative in a series of studies which all point the same way, only changing their minds when an individually big enough study comes out. Hauer says this is exactly what happened with a series of studies on traffic mortalities.
(As if driving didn’t terrify me enough, now I realize traffic laws and road safety designs are being engineered by vulgarized NHST practitioners who apparently don’t know how to patch the paradigm up with emphasis on power or meta-analysis.)
To all your points about the overloading of “Bayesian”, fair enough. I guess I just don’t see why that overloading is necessary.
We lack a good framework for doing this. Bayes rule is the answer to that problem that provides the promise of a solution. The solution to wait a few years and then read a meta review is unsatisfying.
Sure Bayes rule provides a formalization of updating beliefs based on evidence, but you can still be dead wrong. In particular, setting a prior on any given issue isn’t enough. You have to be prepared to update for evidence of the form “I am really bad at setting priors”. And really, priors are just a (possibly arbitrary) way of digesting existing evidence. Sometimes they can be very useful (avoiding privileging the hypothesis) but sometimes they are quite arbitrary.
There are issues such an the Bem paper about porno-precognition where frequentist techniques did suggest that porno-precognition is real but analysing Bems data with Bayesian methods suggested it’s not.
According to the Slate Star Codex article Bem’s results stand up to bayesian analysis quite well (that is, it has a strong Bayes factor). The only exception he mentioned was “I begin with a very low prior for psi phenomena, and a higher prior for the individual experiments and meta-analysis being subtly corrupt”; but there’s nothing especially helpful about this in actually fixing the experimental design and meta-analysis.
Part of LW is that it’s a place to discuss how an AGI could be structured. As such we care about the philosophic level of how you come to know that something is true. As such there an interest into going as basic as possible when looking at epistemology.
How you get from AGI to epistemology eludes me. As long as the AGI can accurately model its interactions with the environment, that’s really all it needs (or can hope) to do.
That sentence is quite easy to say but it effectively means there no such thing as pure absolute objective truth. If you use tools A you get truth X and if you use tools B you get truth Y. Neither X or Y are “more true”. That’s not an appealing conclusion to many people.
One of them is more useful for prediction and inference. They can guide you towards observing mechanisms useful for future hypothesis generation. That’s all you can hope for. Especially in the case of “are low-salt diets healthy”. A “Yes” or “No” to that question will never be truthful, because “health” and “for what segments of the population” and “in conjunction with what other lifestyle factors” are left underspecified. And you’ll never get rid of the kernel of doubt that the low-sodium lobby has been the silent force behind all the anti-salt research this whole time.
The best you can do is provide enough evidence that anyone who points out your hypothesis is not truth can be reasonably called a pedant or conspiracy theorist, but not 100% guaranteed wrong.
As you might see, I am a fan of the idea of Dissolving epistemology.
Can you point to examples of these “holy wars”? I haven’t encountered something I’d describe like that, so I don’t know if we’ve been seeing different things, or just interpreting it differently.
To me it looks like a tension between a method that’s theoretically better but not well-established, and a method that is not ideal but more widely understood so more convenient—a bit like the tension between the metric and imperial systems, or between flash and html5.
The term “holy war” or “religious war” is often used to describe debates where people advocate for a side with an intensity disproportionate to the stakes, (e.g. the proper pronunciation of “gif”, vi vs. emacs, surrogate vs. natural primary keys in the RDBM). That’s how I read the OP, and it’s fitting in context.
Dude, I’m being genuinely curious about what “holy wars” he’s talking about. So far I got:
a definition of “holy war” in this context
a snotty “shut up, only statisticians are allowed to talk about this topic”
… but zero actual answers, so I can’t even tell if he’s talking about some stupid overblown bullshit, or if he’s just exaggerating what is actually a pretty low-key difference in opinion.
A “holy war” between Bayesians and frequentists exists in the modern academic literature for statistics, machine learning, econometrics, and philosophy (this is a non-exhaustive list).
Bradley Efron, who is arguably the most accomplished statistician alive, wrote the following in a commentary for Science in 2013 [1]:
The term “controversial theorem” sounds like an oxymoron, but Bayes’ theorem has played this part for two-and-a-half centuries. Twice it has soared to scientific celebrity, twice it has crashed, and it is currently enjoying another boom. The theorem itself is a landmark of logical reasoning and the first serious triumph of statistical inference, yet is still treated with suspicion by most statisticians. There are reasons to believe in the staying power of its current popularity, but also some signs of trouble ahead.
[...]
Bayes’ 1763 paper was an impeccable exercise in probability theory. The trouble and the subsequent busts came from overenthusiastic application of the theorem in the absence of genuine prior information, with Pierre-Simon Laplace as a prime violator. Suppose that in the twins example we lacked the prior knowledge that one-third of twins are identical. Laplace would have assumed a uniform distribution between zero and one for the unknown prior probability of identical twins, yielding 2⁄3 rather than 1⁄2 as the answer to the physicists’ question. In modern parlance, Laplace would be trying to assign an “uninformative prior” or “objective prior”, one having only neutral effects on the output of Bayes’ rule. Whether or not this can be done legitimately has fueled the 250-year controversy.
Frequentism, the dominant statistical paradigm over the past hundred years, rejects the use of uninformative priors, and in fact does away with prior distributions entirely. In place of past experience, frequentism considers future behavior. An optimal estimator is one that performs best in hypothetical repetitions of the current experiment. The resulting gain in scientific objectivity has carried the day, though at a price in the coherent integration of evidence from different sources, as in the FiveThirtyEight example.
The Bayesian-frequentist argument, unlike most philosophical disputes, has immediate practical consequences.
In another paper published in 2013, Efron wrote [2]:
The two-party system [Bayesian and frequentist] can be upsetting to statistical consumers, but it has been a good thing for statistical researchers — doubling employment, and spurring innovation within and between the parties. These days there is less distance between Bayesians and frequentists, especially with the rise of objective Bayesianism, and we may even be heading toward a coalition government.
The two philosophies, Bayesian and frequentist, are more orthogonal than antithetical. And of course, practicing statisticians are free to use whichever methods seem better for the problem at hand — which is just what I do.
Thirty years ago, Efron was more critical of Bayesian statistics [3]:
A summary of the major reasons why Fisherian and NPW [Neyman–Pearson–Wald] ideas have shouldered Bayesian theory aside in statistical practice is as follows:
Ease of use: Fisher’s theory in particular is well set up to yield answers on an easy and almost automatic basis.
Model building: Both Fisherian and NPW theory pay more attention to the preinferential aspects of statistics.
Division of labor: The NPW school in particular allows interesting parts of a complicated problem to be broken off and solved separately. These partial solutions often make use of aspects of the situation, for example, the sampling plan, which do not seem to help the Bayesian.
Objectivity: The high ground of scientific objectivity has been seized by the frequentists.
None of these points is insurmountable, and in fact, there have been some Bayesian efforts on all four. In my opinion a lot more such effort will be needed to fulfill Lindley’s prediction of a Bayesian 21st century.
The following bit of friendly banter in 1965 between M. S. Bartlett and John W. Pratt shows that the holy war was ongoing 50 years ago [4]:
Bartlett: I am not being altogether facetious in suggesting that, while non-Bayesians should make it clear in their writings whether they are non-Bayesian Orthodox or non-Bayesian Fisherian, Bayesians should also take care to distinguish their various denominations of Bayesian Epistemologists, Bayesian Orthodox and Bayesian Savages. (In fairness to Dr Good, I could alternatively have referred to Bayesian Goods; but, oddly enough, this did not sound so good.)
Pratt: Professor Bartlett is correct in classifying me a Bayesian Savage, though I might take exception to his word order. On the whole, I would rather be called a Savage Bayesian than a Bayesian Savage. Of course I can quite see that Professor Bartlett might not want to admit the possibility of a Good Bayesian.
Dude, I’m being genuinely curious about what “holy wars” he’s talking about.
For lots of “holy war” anecdotes, see The Theory That Would Not Die by Sharon Bertsch McGrayne.
...I can’t even tell if he’s talking about some stupid overblown bullshit, or if he’s just exaggerating what is actually a pretty low-key difference in opinion.
Do you consider personal insults, accusations of fraud, or splitting academic departments along party lines to be “a pretty low-key difference in opinion”? If so, then it is “overblown bullshit,” otherwise it isn’t.
Can you point to examples of these “holy wars”? I haven’t encountered something I’d describe like that, so I don’t know if we’ve been seeing different things, or just interpreting it differently.
Various bits of Jaynes’s “Confidence intervals vs Bayesian intervals” seem holy war-ish to me. Perhaps the juiciest bit (from pages 197-198, or pages 23-24 of the PDF):
I first presented this result to a recent convention of reliability and quality control statisticians working in the computer and aerospace industries; and at this point the meeting was thrown into an uproar, about a dozen people trying to shout me down at once. They told me, “This is complete nonsense. A method as firmly established and thoroughly worked over as confidence intervals can’t possibly do such a thing. You are maligning a very great man; Neyman would never have advocated a method that breaks down on such a simple problem. If you can’t do your arithmetic right, you have no business running around giving talks like this”.
After partial calm was restored, I went a second time, very slowly and carefully, through the numerical work [...] with all of them leering at me, eager to see who would be the first to catch my mistake [...] In the end they had to concede that my result was correct after all.
To make a long story short, my talk was extended to four hours (all afternoon), and their reaction finally changed to: “My God – why didn’t somebody tell me about these things before? My professors and textbooks never said anything about this. Now I have to go back home and recheck everything I’ve done for years.”
This incident makes an interesting commentary on the kind of indoctrination that teachers of orthodox statistics have been giving their students for two generations now.
I have to say, I seriously don’t get the Bayesian vs Frequentist holy wars. It seems to me the ratio of importance to education of its participants is ridiculously low.
Bayesian and frequentist methods are sets of statistical tools, not sacred orders to which you pledge a blood oath. Just understand the usage of each tools, and the fact that virtually any model of something that happens in the real world is going to be misspecified.
It’s because Bayesian methods really do claim to be more than just a set of tools. They are supposed to be universally applicable.
This is a bit of an exaggeration.
Additionally, you are only talking about the ‘sets of statistical tools’, where in my experience the bigger disagreement often lies in whether a person accepts that probabilities can be subjective or not; And yes—this does matter.
Can you please give an example of where the possible subjectivity of probabilities matter? I mean this in earnest.
‘From my point of view the probability for X is Y, but from his point of view at the time it would’ve been Z’. (subjective) vs ‘The Probability for X is Y’ (‘objective’).
Honestly though, frequentists use subjective probabilities all the time and you can argue that frequentism is just as subjective as bayesinism, so even that disagreement is quite muddy.
Can you be more concrete? When would this matter for two people trying to share a model and make predictions of future events?
Part of it is that Bayesianism claims to be not just a better statistical tool, but a new and better epistemology, a replacement and improvement over Aristotelian logic.
There are a bunch of issues involved. It hard to speak about them because the term Bayesianism is encompasses a wide array of ideas and everytime it’s used it might refer to a different subset of that cluster of ideas.
Part of LW is that it’s a place to discuss how an AGI could be structured. As such we care about the philosophic level of how you come to know that something is true. As such there an interest into going as basic as possible when looking at epistemology. There are issues about objective knowledge versus “subjective” Bayesian priors that are worth thinking about.
We live at a time where up to 70% of scientific research can’t be replicated. Frequentism might not be to blame for all of that, but it does play it’s part. There are issues such an the Bem paper about porno-precognition where frequentist techniques did suggest that porno-precognition is real but analysing Bems data with Bayesian methods suggested it’s not.
There are further issues that a lot of additional assumptions are loaded into the word Bayesianism if you use that word on LessWrong. What Bayesianism taught me speaks about a bunch of issues that only have indirectly something to do with Bayesian tools vs. Frequentist tools.
Let’s say I want to decide how much salt I should eat. I do follow the consensus that salt is bad and therefore have some prior that salt is bad. Then a new study comes along and says that low salt diets are unhealthy. If I want to make good decisions I have to ask: How much should I update? There no good formal way for making such decisions. We lack a good framework for doing this. Bayes rule is the answer to that problem that provides the promise of a solution. The solution to wait a few years and then read a meta review is unsatisfying.
In the absence of a formal way to do the reasoning, many people do use informal ways of updating towards new evidence. Cognitive bias research suggest that the average person isn’t good at this.
That sentence is quite easy to say but it effectively means there no such thing as pure absolute objective truth. If you use tools A you get truth X and if you use tools B you get truth Y. Neither X or Y are “more true”. That’s not an appealing conclusion to many people.
Full disclosure: I have papers using B (on structure learning using BIC, which is an approximation to a posterior of a graphical model), and using F (on estimation of causal effects). I have no horse in this race.
See, this is precisely the kind of stuff that makes me shudder, that regularly appears on LW, in an endless stream. While Scott Alexander is busy bible thumping data analysts on his blog, people here say stuff like this.
Bayes rule doesn’t provide shit. Bayes rule just says that p(A | B) p(B) = p(B | A) p(A).
Here’s what you actually need to make use of info in this study:
(a) Read the study.
(b) See if they are actually making a causal claim.
(c) See if they are using experimental or observational data.
(d) Experimental? Do we believe the setup? Are we in a similar cohort? What about experimental design issues? Observational? Do they know what they are doing, re: causality-from-observational-data? Is their model that permits this airtight (usually it is not, see Scott’s post on “adjusting for confounders”. Generally to really believe that adjusting for confounders is reasonable you need a case where you know all confounders are recorded by definition of the study, for instance if doctors prescribe medicine based only on recorded info in the patient file).
(e) etc etc etc
I mean what exactly did you expert, a free lunch? Getting causal info and using it is hard.
p.s. If you skeptical about statistics papers that adjust for confounders, you should also be skeptical about missing data papers that assume MAR (missing at random). It is literally the same assumption.
You might want to read a bit more precisely. I did choose my words when I said “promise of a solution” instead of “a solution”.
In particular MetaMed speaks about wanting to produce a system of Bayesian analysis of medical papers. (Bayesian mathematical assessment of diagnosis)
You miss the point. When it comes to interviewing candidates for job then we found out that unstructured human assessment doesn’t happen that good.
It could very well be that the standard unstructured way of reading papers is not optimal and that we should have Bayesian beliefs nets in which we plug numbers such as whether the experiment is experimental or observational.
Whether MetaMed or someone else succeeds at that task and provides a good improvement on the status quo isn’t certain but there are ideas to explore.
Is it clear that MetaMed as group of self professed Bayesians provide a useful service? Maybe, maybe not. On the other hand the philosophy on which MetaMed operates is not the standard philosophy on which the medical establishment operates.
I don’t know how Metamed works (and it’s sort of their secret sauce, so they probably will not tell us without an NDA). I am guessing it is some combination of doing (a) through (e) above for someone who cannot do it themselves, and possibly some B stats. Which seems like a perfectly sensible business model to me!
I don’t think the secret sauce is in the B stats part of what they are doing, though. If we had a hypothetical company called “Freqmed” that also humanwaved (a) through (e), and then used F stats I doubt they would get non-sensible answers. It’s about being sensible, not your identity as a statistician.
I can be F with Bayes nets. Bayes nets are just a conditional independence model.
I don’t know how successful Metamed will be, but I honestly wish them the best of luck. I certainly think there is a lot of crazy out there in data analysis, and it’s a noble thing to try to make money off of making things more sensible.
The thing is, I don’t know about a lot of the things that get talked about on LW. I do know about B and F a little bit, and about causality a little bit. And a huge chunk of stuff people say is just plain wrong. So I tell them it’s wrong, but they keep going and don’t change what they say at all. So how should I update—that folks on this rationalist community generally don’t know what they are talking about and refuse to change?
It’s like wikipedia—the first sentence in the article on confounders is wrong on wikipedia (there is a very simple 3 node example that violates that definition). The talk page on Bayesian networks is a multi-year tale of woe and ignorance. I once got into an edit war with a resident bridge troll for that article, and eventually gave up and left, because he had more time. What does that tell me about wikipedia?
But we don’t. MetaMed did come out of a certain kind of thinking. The project had a motivation.
Just because you know what the people in the statistic community mean when they say “Bayesian” doesn’t automatically mean that you know what someone on LW means when he says Bayesian.
If you look at the “What Bayesianism taught me”, there a person who changed their beliefs through learning about Bayesianism. Do the points he makes have something to do with Frequentism vs. Bayesianism? Not directly. On the other hand he did change major beliefs about he thinks about how the world and epistemology.
That means that the term Bayesianism as used in that article isn’t completely empty.
Sensiblism might be a fun name for a philosophy. On the first LW meetup where I attended one of the participants had a scooter. My first question was about his traveling speed and how much time he effectively wins by using it. On that question he gave a normal answer.
My second question was over the accident rate of scooters. He replied something along the lines: “I really don’t know, I should research the issue more in depth and get the numbers.” That not the kind of answer normal people give when faced with the question for safety of the mode of travel.
You could say he’s simply sensible while 99% of the population that out there that would answer the question differently isn’t. On the other hand it’s quite difficult to explain to those 99% that they aren’t sensible. If you prod them a bit they might admit that knowing accident risks is useful for making a decision about one’s mode of travel but they don’t update on a deep level.
Then people like you come and say: “Well of course we should be sensible. There no need to point is about explicitly or to give it a fancy name. Being sensible should go without saying.”
The problem is that in practice it doesn’t go without saying and speaking about it is hard. Calling it Bayesianism might be a very confusing way to speak about it but it seems to be an improvement over having no words at all. Maybe tabooing Bayesianism as word on LW would be the right choice. Maybe the word produces more problems than it solves.
“In statistics, a confounding variable (also confounding factor, a confound, or confounder) is an extraneous variable in a statistical model that correlates (directly or inversely) with both the dependent variable and the independent variable.” is at the moment that sentence. How would you change the sentence? There no reason why we shouldn’t fix that issue right now.
Counterexamples to a definition (this example is under your definition but is clearly not what we mean by confounder) are easier than a definition. A lot of analytic philosophy is about this. Defining “intuitive terms” is often not as simple as it seems. See, e.g.:
http://arxiv.org/abs/1304.0564
If you think you can make a “sensible” edit based on this paper, I will be grateful if you did so!
re: the rest of your post, words mean things. B is a technical term. I think if you redefine B as internal jargon for LW you will be incomprehensible to stats/ML people, and you don’t want this. Communication across fields is hard enough as it is (“academic coordination problem”), let’s not make it harder by not using standard terminology.
I am 100% behind this idea (and in general taboo technical terms unless you really know a lot about it).
But they don’t solve the problem of Wikipedia being in your judgement wrong about this point.
If you look at the dictionary you will find that most words have multiple meanings.They also happen to evolve meaning over time.
Let’s see if I can precommit to not posting here anymore.
Speaking of, an interesting paper which distinguishes the Fisher approach to testing from the Neyman-Pearson approach and shows how you can unify/match some of that with Bayesian methods.
It seems to me that there’s a bigger risk from Bayesian methods. They’re more sensitive to small effect sizes (doing a frequentist meta-analysis you’d count a study that got a p=0.1 result as evidence against, doing a bayesian one it might be evidence for). If the prior isn’t swamped then it’s important and we don’t have good best practices for choosing priors; if the prior is swamped then the bayesianism isn’t terribly relevant. And simply having more statistical tools available and giving researchers more choices makes it easier for bias to creep in.
Bayes’ theorem is true (duh) and I’d accept that there are situations where bayesian analysis is more effective than frequentist, but I think it would do more harm than good in formal science.
Why would you do that? If I got a p=0.1 result doing a meta-analysis, I wouldn’t be surprised at all since things like random-effects means it takes a lot of data to turn in a positive result at the arbitrary threshold of 0.05. And as it happens, in some areas, an alpha of 0.1 is acceptable: for example, because of the poor power of tests for publication bias, you can find respected people like Ioannides using that particular threshold (I believe I last saw that in his paper on the binomial test for publication bias).
If people really acted that way, we’d see odd phenomenon where people saw successive meta-analysts on whether grapes cure cancer: 0.15 that grapes cure cancer (decreases belief grapes cure cancer), 0.10 (decreases), 0.07 (decreases), someone points out that random-effects is inappropriate because studies show very low heterogeneity and the better fixed-effects analysis suddenly reveals that the true p-value is now at 0.05 (everyone’s beliefs radically flip as they go from ‘grapes have been refuted and are quack alt medicine!’ to ‘grapes cure cancer! quick, let’s apply to the FDA under a fast track’). Instead, we see people acting more like Bayesians...
Is that a guess, or a fact based on meta-studies showing that Bayesian-using papers cook the books more than NHST users with p-hacking etc?
Turns out I am overoptimistic and in some cases people have done just that: interpreted a failure to reject the null (due to insufficient power, despite being evidence for an effect) as disproving the alternative in a series of studies which all point the same way, only changing their minds when an individually big enough study comes out. Hauer says this is exactly what happened with a series of studies on traffic mortalities.
(As if driving didn’t terrify me enough, now I realize traffic laws and road safety designs are being engineered by vulgarized NHST practitioners who apparently don’t know how to patch the paradigm up with emphasis on power or meta-analysis.)
No. The most basic version of meta-analysis is, roughly, that if you have two p=0.1 studies, the combined conclusion is p=0.01.
To all your points about the overloading of “Bayesian”, fair enough. I guess I just don’t see why that overloading is necessary.
Sure Bayes rule provides a formalization of updating beliefs based on evidence, but you can still be dead wrong. In particular, setting a prior on any given issue isn’t enough. You have to be prepared to update for evidence of the form “I am really bad at setting priors”. And really, priors are just a (possibly arbitrary) way of digesting existing evidence. Sometimes they can be very useful (avoiding privileging the hypothesis) but sometimes they are quite arbitrary.
According to the Slate Star Codex article Bem’s results stand up to bayesian analysis quite well (that is, it has a strong Bayes factor). The only exception he mentioned was “I begin with a very low prior for psi phenomena, and a higher prior for the individual experiments and meta-analysis being subtly corrupt”; but there’s nothing especially helpful about this in actually fixing the experimental design and meta-analysis.
How you get from AGI to epistemology eludes me. As long as the AGI can accurately model its interactions with the environment, that’s really all it needs (or can hope) to do.
One of them is more useful for prediction and inference. They can guide you towards observing mechanisms useful for future hypothesis generation. That’s all you can hope for. Especially in the case of “are low-salt diets healthy”. A “Yes” or “No” to that question will never be truthful, because “health” and “for what segments of the population” and “in conjunction with what other lifestyle factors” are left underspecified. And you’ll never get rid of the kernel of doubt that the low-sodium lobby has been the silent force behind all the anti-salt research this whole time.
The best you can do is provide enough evidence that anyone who points out your hypothesis is not truth can be reasonably called a pedant or conspiracy theorist, but not 100% guaranteed wrong.
As you might see, I am a fan of the idea of Dissolving epistemology.
Can you point to examples of these “holy wars”? I haven’t encountered something I’d describe like that, so I don’t know if we’ve been seeing different things, or just interpreting it differently.
To me it looks like a tension between a method that’s theoretically better but not well-established, and a method that is not ideal but more widely understood so more convenient—a bit like the tension between the metric and imperial systems, or between flash and html5.
The term “holy war” or “religious war” is often used to describe debates where people advocate for a side with an intensity disproportionate to the stakes, (e.g. the proper pronunciation of “gif”, vi vs. emacs, surrogate vs. natural primary keys in the RDBM). That’s how I read the OP, and it’s fitting in context.
Sure, I’m just not sure which debates he’s referring to … is it on LessWrong? Elsewhere?
[etc.]
Ugh. Here is a good heuristic:
“Not in stats or machine learning? Stop talking about this.”
Dude, I’m being genuinely curious about what “holy wars” he’s talking about. So far I got:
a definition of “holy war” in this context
a snotty “shut up, only statisticians are allowed to talk about this topic”
… but zero actual answers, so I can’t even tell if he’s talking about some stupid overblown bullshit, or if he’s just exaggerating what is actually a pretty low-key difference in opinion.
A “holy war” between Bayesians and frequentists exists in the modern academic literature for statistics, machine learning, econometrics, and philosophy (this is a non-exhaustive list).
Bradley Efron, who is arguably the most accomplished statistician alive, wrote the following in a commentary for Science in 2013 [1]:
In another paper published in 2013, Efron wrote [2]:
Thirty years ago, Efron was more critical of Bayesian statistics [3]:
The following bit of friendly banter in 1965 between M. S. Bartlett and John W. Pratt shows that the holy war was ongoing 50 years ago [4]:
For further reading I recommend [5], [6], [7].
[1]: Efron, Bradley. 2013. “Bayes’ Theorem in the 21st Century.” Science 340 (6137) (June 7): 1177–1178. doi:10.1126/science.1236536.
[2]: Efron, Bradley. 2013. “A 250-Year Argument: Belief, Behavior, and the Bootstrap.” Bulletin of the American Mathematical Society 50 (1) (April 25): 129–146. doi:10.1090/S0273-0979-2012-01374-5.
[3]: Efron, B. 1986. “Why Isn’t Everyone a Bayesian?” American Statistician 40 (1) (February): 1–11. doi:10.1080/00031305.1986.10475342.
[4]: Pratt, John W. 1965. “Bayesian Interpretation of Standard Inference Statements.” Journal of the Royal Statistical Society: Series B (Methodological) 27 (2): 169–203. http://www.jstor.org/stable/2984190.
[5]: Senn, Stephen. 2011. “You May Believe You Are a Bayesian but You Are Probably Wrong.” Rationality, Markets and Morals 2: 48–66. http://www.rmm-journal.com/htdocs/volume2.html.
[6]: Gelman, Andrew. 2011. “Induction and Deduction in Bayesian Data Analysis.” Rationality, Markets and Morals 2: 67–78. http://www.rmm-journal.com/htdocs/volume2.html.
[7]: Gelman, Andrew, and Christian P. Robert. 2012. “‘Not Only Defended but Also Applied’: The Perceived Absurdity of Bayesian Inference”. Statistics; Theory. arXiv (June 28).
Ilya responded to your second paragraph not the first one. metric vs. imperial or flash vs. html5 are not good analogies.
For lots of “holy war” anecdotes, see The Theory That Would Not Die by Sharon Bertsch McGrayne.
Do you consider personal insults, accusations of fraud, or splitting academic departments along party lines to be “a pretty low-key difference in opinion”? If so, then it is “overblown bullshit,” otherwise it isn’t.
Various bits of Jaynes’s “Confidence intervals vs Bayesian intervals” seem holy war-ish to me. Perhaps the juiciest bit (from pages 197-198, or pages 23-24 of the PDF):