What abstraction describes the scientist’s thought process, the engine within the scientific method? I suggest it is Bayesian reasoning but even if it is not, one thing it cannot be is more of the Scientific method, as that would lead to recursion. If it is not Bayesian reasoning, no matter, Bayesianism is a failed complete explanation and the Scientific method is half an adequate method—they are still different from each other.
I don’t think scientists think about it much. That’s more the sort of thing philosophers of science think about. The smarter scientists do what is essentially Bayesian updating, although very few of them would actually put a number on their prior and calculate their posterior based on a surprising p value. They just know that it takes a lot of very good evidence to overturn a well-established theory, and not so much evidence to establish a new claim consistent with the existing scientific knowledge.
What goes into p-values necessarily involves the arbitrary classes the scientist has decided evidence would fit in, and then measures his or her surprise at the class of evidence that is found. That’s not P(B|~A), it’s P(C|~A).
Stating your hypothesis beforehand and specifying exactly what will and will not count as evidence before you collect your data is a very good way of minimising the effect of your own biases, but naughty scientists can and do take the opportunity to cook the experiment by strategically choosing what will count as evidence. Still, overall it’s better than letting scientists pore over the entrails of their experimental results and make up a hypothesis after the fact. If a great new hypothesis comes out of the data then you have do to your legwork and do a whole new experiment to test the new hypothesis, and that’s how it should be. If the effect is real it will keep. The universe won’t change on you.
Do you have examples of boundary cases that distinguish a rigorously established one with one that isn’t?
It’s not a binary distinction. Rather, if you’re unaware of the ways that people’s P(B) estimates can be wildly inaccurate and think that your naive P(B) estimates are likely to be accurate then you can update into all sorts of stupid and factually false beliefs even if you’re an otherwise perfect Bayesian.
The people who think that John Edward can talk to dead people might well be perfect Bayesians who just haven’t checked to see what the probability is that John Edward could produce the effects he produces in a world where he can’t talk to dead people. If you think the things he does are improbable then it’s technically correct to update to a greater belief in the hypothesis that he can channel dead people. It’s only if you know that his results are exactly what you’d expect in a world where he’s a fake that you can do the correct thing, which is not update your prior belief that the probability that he’s a fake is 99.99...9%.
If someone’s done some actual work to see if they can falsify the null hypothesis that PUS techniques are indistinguishable from a change, a comb, a shower and asking some women out I’d be interested in seeing it. In the absence of such work I think good Bayesians have to recognise that they don’t have a P(B) with small enough error bars to be very useful.
Stating your hypothesis beforehand and specifying exactly what will and will not count as evidence before you collect your data is a very good way of minimising the effect of your own biases
Exactly, it’s a cost and a deviation from ideal thinking to minimize the influence of scientists who receive no training in debiasing. So not “If you’re doing Bayes right it’s the same as doing science”, where “science” is an imperfect human construct designed to accommodate the more biased of scientists.
If a great new hypothesis comes out of the data then you have do to your legwork and do a whole new experiment to test the new hypothesis, and that’s how it should be. If the effect is real it will keep. The universe won’t change on you.
These are costs. It’s important, and in some contexts cheap, to know why and how things work instead of saying “I’ll ignore that since enough replication always solves such problems,” when one doesn’t know in which cases one is doing nearly pointless extra work and in which one isn’t doing enough replication. It’s an obviously sub-optimal solution along the lines of “thinking isn’t important; assume infinite resources.”
you can update into all sorts of stupid and factually false beliefs even if you’re an otherwise perfect Bayesian.
It’s praise through faint damnation of the laws of logic that they don’t prevent one from shooting one’s own foot off. Handcuffs are even better at that task, but they are less useful for figuring out what is true.
It’s not a binary distinction.
Exactly, so in “some of the LW groupthink holds that you can do a valid Bayesian update in the absence of a rigorously established base rate,” they are right, and “updating is no better than guesswork in the absence of a rigorously obtained P(B),” is not always true, such as when the following condition doesn’t apply, and it doesn’t here:
if you’re unaware of the ways that people’s P(B) estimates can be wildly inaccurate and think that your naive P(B) estimates are likely to be accurate
What do you think this site is for? People are reading and sharing research papers about biases in their free time. One could likewise criticize jet fuel for being inappropriate for an old fashioned coal powered locomotive. Yes, jet fuel will explode a train...this is not a flaw of jet fuel, and it does not mean that the coal-train is better at transporting things.
If someone’s done some actual work to see if they can falsify the null hypothesis that PUA techniques
That’s not the claim in question.
In any case, there are better ways to think about this subject than with null hypotheses. Those are social constructs focusing (decently) on optimizing preventing belief in untrue things, rather than determining what’s most likely true, here false beliefs have relatively less cost than in most of science, and will in any case only be held probabilistically.
Exactly, it’s a cost and a deviation from ideal thinking to minimize the influence of scientists who receive no training in debiasing. So not “If you’re doing Bayes right it’s the same as doing science”, where “science” is an imperfect human construct designed to accommodate the more biased of scientists.
There’s a very good reason why we do double-blind, placebo-controlled trials rather than just recruiting a bunch of people who browse LW to do experiments with, on the basis that since LWers are “trained in debiasing” they are immune to wishful thinking, confirmation bias, the experimenter effect, the placebo effect and so on.
I have a great deal more faith in methodological constructs that make it impossible for bias to have an effect than in people’s claims to “debiased” status.
Don’t get me wrong, I think that training in avoiding cognitive biases is very important because there are lots of important things we do where we don’t have the luxury of specifying our hypotheses in strictly instrumental terms beforehand, collecting data via suitably blinded proxies and analysing it just in terms of our initial hypothesis.
However my view is that if you think that scientific methodology is just a set of training wheels for people who haven’t clicked on all the sequences yet and that browsing LW makes you immune to the problems that scientific methodology exists specifically to prevent then it’s highly likely you overestimate your resistance to bias.
These are costs. It’s important, and in some contexts cheap, to know why and how things work instead of saying “I’ll ignore that since enough replication always solves such problems,” when one doesn’t know in which cases one is doing nearly pointless extra work and in which one isn’t doing enough replication. It’s an obviously sub-optimal solution along the lines of “thinking isn’t important; assume infinite resources.”
There’s also a cost to acting on the assumption that every correlation is meaningful in a world where we have so much data available to us that we can find arbitrarily large numbers of spurious correlations at P<0.01 if we try hard enough. Either way you’re spending resources, but spending resources in the cause of epistemological purity is okay with me. Spending resources on junk because you are not practising the correct purification rituals is not.
It’s praise through faint damnation of the laws of logic that they don’t prevent one from shooting one’s own foot off. Handcuffs are even better at that task, but they are less useful for figuring out what is true.
The accepted scientific methodology is more like a safety rope or seat belt. Sometimes annoying, almost always rational.
What do you think this site is for? People are reading and sharing research papers about biases in their free time. One could likewise criticize jet fuel for being inappropriate for an old fashioned coal powered locomotive. Yes, jet fuel will explode a train...this is not a flaw of jet fuel, and it does not mean that the coal-train is better at transporting things.
Rather than what a site is for I focus on what a site is.
In many, many ways this site has higher quality discourse than, say, the JREF forums and a population who on average are better versed in cognitive biases. However this discussion has made it obvious to me that on average the JREF forumites are far more aware than the LWers of the various ways that people’s estimates of P(B) can be wrong and can be manipulated.
They would never put it in those terms since Bayes is a closed book to them, but they are very well aware that you can work yourself into completely wrong positions if you aren’t sophisticated enough to correctly estimate the actual base rate at which one would expect to observe things like homeopathy apparently working, people apparently talking to the dead, people apparently having psychic powers, NLP apparently letting you seduce people and so on in worlds where none of these things did anything except act as placebos (at best).
If your P(B) is off then using Bayes Theorem is just being a mathematically precise idiot instead of an imprecise idiot. You’ll get to exactly the right degree of misguided belief, based on the degree to which you’re mistaken about the correct value of P(B,) but that’s still far worse than being someone who wouldn’t know Bayes from a bar of soap but who intuitively perceives something closer to the correct P(B).
The idea that LW browsers think they are liquid-fuelled jets while the scientists who do the actual work of moving society forward are boring old coal trains worries me. I think of LW’s “researchers” as a bunch of enthusiastic amateurs with cheap compasses and hand-drawn maps running around in the bushes in a mildly organised fashion, while scientists are painstakingly and one inch at a time building a gigantic sixteen-lane highway for us all to drive down.
There’s a very good reason why we do double-blind, placebo-controlled trials rather than just recruiting a bunch of people who browse LW to do experiments with
Yes, and people who actually understand the tradeoffs in using formal scientific reasoning and its deviations from the laws of reasoning are the only people in position to intelligently determine that. Those who say “always use the scientific method for important things” or, though I don’t know that there ever has been or ever will be such a person, “always recruit a bunch of people who browse LW,” are not thinking any more than a broken clock is ticking. As an analogy, coal trains are superior to jet planes for transporting millions of bushels of wheat from Alberta to Toronto. It would be inane and disingenuous for broken records always calling for the use of coal trains to either proclaim their greater efficiency in determining which vehicle to use to transport things because they got the wheat case right or pretend that they have a monopoly on calling for the use of trains.
With reasoning, one can intelligently determine a situation’s particulars and spend to eliminate a bias (for example by making a study double-blind) rather than doing that all the time or relying on skill in this case,and without relying on intuition to determine when. One can see that in an area, the costs of thinking something true when it isn’t exceeds the costs of thinking it’s false when it’s true, and set up correspondingly strict protocols, rather than blindly always paying in true things not believed, time, and money for the same, sometimes inadequate and sometimes excessive, amount of skepticism.
However my view is that if you think that scientific methodology is just a set of training wheels for people who haven’t clicked on all the sequences yet and that browsing LW makes you immune to the problems that scientific methodology exists specifically to prevent
My view is that if you think anyone who has interacted with you in this thread has that view you have poor reading comprehension skills.
There’s also a cost to acting on the assumption that every correlation is meaningful
So one can simply...not do that. And be a perfectly good Bayesian.
spending resources in the cause of epistemological purity is okay with me. Spending resources on junk because you are not practising the correct purification rituals is not.
It is not the case that every expenditure reducing the likelihood that something is wrong is optimal,as instead one could instead spend a bit on determining which areas ought to have extra expenditure reducing the likelihood that something is wrong there.
In any case, science has enshrined a particular few levels of spending on junk that it declares perfectly fine because the “correct” purification rituals have been done. I do not think that such spending on junk is justified because in those cases no, science is not strict enough. One can declare a set of arbitrary standards and declare spending according to them correct and ideologically pure or similar, but as one is spending fungible resources towards research goals this is spurious morality.
You’ll get to exactly the right degree of misguided belief...far worse than being someone who wouldn’t know Bayes from a bar of soap but who intuitively
Amazing, let me try one. If a Bayesian reasoner is hit by a meteor and put into a coma, he is worse off than a non-Bayesian who stayed indoors playing Xbox games and was not hit by a meteor. So we see that Bayesian reasoning is not sufficient to confer immortality and transcendence into a godlike being made of pure energy.
People on this site are well aware that if scientific studies following the same rules as the rest of science indicate that people have psychic powers, there’s something wrong with the scientific method and the scientists’ understanding of it because the notion that people have psychic powers are bullshit.
The idea that LW browsers think they are liquid-fuelled jets while the scientists who do the actual work of moving society forward are boring old coal trains worries me.
People here know that there is not some ineffable magic making science the right method in the laboratory and faith the right method in church, or science the right method in the laboratory and love the right method everywhere else, science the right method everywhere and always, etc., as would have been in accordance with people’s intuitions.
How unsurprising it is that actually understanding the benefits and drawbacks of science leads one to conclude that often science is not strict enough, and often too strict, and sometimes but rarely entirely inappropriate when used, and sometimes but rarely unused when it should be used, when heretofore everything was decided by boggling intuition.
Yes, and people who actually understand the tradeoffs in using formal scientific reasoning and its deviations from the laws of reasoning are the only people in position to intelligently determine that.
I’m not going to get into a status competition with you over who is in a position to determine what.
My view is that if you think anyone who has interacted with you in this thread has that view you have poor reading comprehension skills.
The most obvious interpretation of your statement that science is “an imperfect human construct designed to accommodate the more biased of scientists” and that “it’s a cost and a deviation from ideal thinking to minimize the influence of scientists who receive no training in debiasing” is that you think your LW expertise means that you wouldn’t need those safeguards. If I misinterpreted you I think it’s forgivable given your wording, but if I misinterpreted you then please help me out in understanding what you actually meant.
People on this site are well aware that if scientific studies following the same rules as the rest of science indicate that people have scientific powers, there’s something wrong with the scientific method and the scientists’ understanding of it because the notion that people have psychic powers are bullshit.
I’m responding under the assumption that the second “scientific” should read “psychic”. My point was not that people here didn’t get that—I imagine they all do. My point is that the evidence on the table to support PUA theories is vulnerable to all the same problems as the evidence supporting claimed psychic powers, and that when it came to this slightly harder problem some people here seemed to think that the evidence on the table for PUA was actually evidence we would not expect to see in a world where PUA was placebo plus superstition.
I think the JREF community would take one sniff of PUA and say “Looks like a scam based on a placebo”, and that they would be better Bayesians when they did so than anyone who looks at the same evidence and says “Seems legit!”.
(I suspect that the truth is that PUA has a small non-placebo effect, since we live in a universe with ample evidence that advertising and salesmanship have small non-placebo effects that are statistically significant if you get a big enough sample size. However I also suspect that PUAs have no idea which bits of PUA are the efficacious bits and which are superstition, and that they could achieve the modest gains possible much faster if they knew which was which).
I’m not going to get into a status competition with you over who is in a position to determine what.
OK, I will phrase it in different terms that make it explicit that I am making several claims here (one about what Bayesianism can determine, and one about what science can determine). It’s much like I said above:
It’s adequately suited for the accumulation of not-false beliefs, but it both could be better instrumentally designed for humans and is not the bedrock of thinking by which anything works. The thing that is essential to the method you described, “Scientists...have an informal sense of what P(A) is likely to be and are more inclined to question a conclusion if it is unlikely than if it is likely”. What abstraction describes the scientist’s thought process, the engine within the scientific method? I suggest it is Bayesian reasoning but even if it is not, one thing it cannot be is more of the Scientific method, as that would lead to recursion. If it is not Bayesian reasoning, then there are some things I am wrong about, and Bayesianism is a failed complete explanation, and the Scientific method is half of a quite adequate method—but they are still different from each other.
Some people claim Bayesian reasoning models intelligent agents’ learning about their environments, and agents’ deviations from it is failure to learn optimally. This model encompasses choosing when to use something like the scientific method and deciding when it is optimal to label beliefs not as “X% likely to be true, 1-X% likely to be untrue,” but rather “Good enough to rely on by virtue of being satisfactorily likely to be true,” and “Not good enough to rely on by virtue of being satisfactorily likely to be true”. If Bayesianism is wrong, and it may be, it’s wrong.
The scientific method is a somewhat diverse set of particular labeling systems declaring ideas “Good enough to rely on by virtue of being satisfactorily likely to be true,” and “Not good enough to rely on by virtue of being satisfactorily likely to be true.” Not only is the scientific method incomplete by virtue of using a black-box reasoning method inside of it, it doesn’t even claim to be able to adjudicate between circumstances in which it is to be used and in which it is not to be used. It is necessarily incomplete. Scientists’ reliance on intuition to decide when to use it and when not to may well be better than using Bayesian reasoning, particularly if Bayesianism is false, I grant that. But the scientific method doesn’t, correct me if I am wrong, purport to be able to formally decide whether or not a person should subject his or her religious beliefs to it.
The most obvious interpretation of your statement that science is “an imperfect human construct designed to accommodate the more biased of scientists” and that “it’s a cost and a deviation from ideal thinking to minimize the influence of scientists who receive no training in debiasing” is that you think your LW expertise means that you wouldn’t need those safeguards.
I disagree but here is a good example of where Bayesians can apply heuristics that aren’t first-order applications of Bayes rule. The failure mode of the heuristic is also easier to see than where science is accused of being too strict (though that’s really only a part of the total claim, the other parts are that science isn’t strict enough, that it isn’t near Pareto optimal according to its own tradeoffs in which it sacrifices truth, and that it is unfortunately taken as magical by its practitioners).
In those circumstances in which the Bayesian objection to science is that it is too strict, science can reply by ignoring that money is the unit of caring and declare its ideological purity and willingness to always sacrifice resources for greater certainty (such as when the sacrifice is withholding FDA approval of a drug already approved in Europe), “Either way you’re spending resources, but spending resources in the cause of epistemological purity is okay with me. Spending resources on junk because you are not practising the correct purification rituals is not.”
Here, however, the heuristic is “reading charitably”, in which the dangers of excess are really, really obvious. Nonetheless, even if I am wrong about what the best interpretation is, the extra-Bayesian ritual of reading (more) charitably would have had you thinking it more likely than you did that I had meant something more reasonable (and even more so, responding as if I did). It is logically possible that you were reading charitably ideally and my wording was simply terrible. This is a good example of how one can use heuristics other than Bayes’ rule once one discovers one is a human and therefore subject to bias. One can weigh the costs and benefits of it just like each feature of scientific testing.
For “an imperfect human construct designed to accommodate the more biased of scientists”, it would hardly do to assume scientists are all equally biased, and likewise for assuming the construct is optimal no matter the extent of bias in scientists. So the present situation could be improved upon by matching the social restrictions to the bias of scientists and also decreasing that bias. If mostly science isn’t strict enough, then perhaps it should be stricter in general (in many ways it should be) but the last thing to expect is that it is perfectly calibrated. It’s “imperfect”, I wouldn’t describe a rain dance as an “imperfect” method to get rain, it would be an “entirely useless” method. Science is “imperfect”, and it does very well to the extent thinking is warped to accommodate the more biased of scientists, and so something slightly different would be more optimal for the less biased ones.
″...it’s a cost and a deviation from ideal thinking to minimize the influence of scientists who receive no training in debiasing,” and less cost would be called for if they received such training, but not zero. Also, it is important to know that costs are incurred, lest evangelical pastors everywhere be correct when they declare science a “faith”. Science is roughly designed to prevent false things from being called “true” at the expense of true things not being called “true”. This currently occurs to different degrees in different sciences, and it should, and some of those areas should be stricter, and some should be less strict, and in all cases people shouldn’t be misled about belief such that they think there is a qualitative difference between a rigorously established base rate and one not so established, or science and predicting one’s child’s sickness when it vomits a certain color in the middle of the night.
My point is that the evidence on the table to support PUA theories is vulnerable to all the same problems as the evidence supporting claimed psychic powers
It’s not too similar since psychic powers have been found in controlled scientific studies, and they are (less than infinitely, but nearly) certainly not real. PUA theories were formed from people’s observations, then people developed ideas they thought based on the theories, then tested what they thought were the ideas, tested them insufficiently rigorously. Each such idea is barely more likely than the base rate for being correct due to all the failure nodes, but each is more likely, the way barely enriched uranium’s particles are more likely to be U-235 than natural uranium’s are. This is in line with “However I also suspect that PUAs have no idea which bits of PUA are the efficacious bits and which are superstition, and that they could achieve the modest gains possible much faster if they knew which was which”.
When it comes to action, as in psychological experiments in which one is given a single amount of money for correctly guessing the color of something between red and blue, and one determines 60% of the things are red, one should always guess red, one should act upon ideas most likely true if one must act, all else equal.
Any chance of turning this (and some of your other comments) into a top-level post? (perhaps something like, “When You Can (And Can’t) Do Better Than Science”?)
I think the first section should ignore the philosophy of science and cover the science of science, the sociology of it, and concede the sharpshooter’s fallacy, assuming that whatever science does it is trying to do. The task of improving upon the method is then not too normative, since one can simply achieve the same results with fewer resources/better results with the same resources. Also, that way science can’t blame perceived deficiencies on the methods of philosophy, as it could were one to evaluate science according to philosophy’s methods and standards. This section would be the biggest added piece of value that isn’t tying together things already on this site.
A section should look for edges with only one labeled node in the scientific methods where science requires input from a mystery method, such as how scientists generate hypotheses or how scientific revolutions occur. These show the incompleteness of the scientific method as a means to acquire knowledge, even if it is perfect at what it does. Formalization and improvement of the mystery methods would contribute to the scientific method, even if nothing formal within the model changes.
A section should discuss how science isn’t a single method (according to just about everybody), but instead a family of similar methods varying especially among fields. This weakens any claim idealizing science in general, as at most one could claim that a particular field’s method is ideal for human thought and discovery. Assuming each (or most) fields’ methods are ideal (this is the least convenient possible world for the critic of the scientific method as practiced), the costs and benefits of using that method rather than a related scientific method can be speculated upon. I expect to find, as policy debates should not be one sided, that were a field to use other fields’ methods it would have advantages and disadvantages; the simple case is choice of stricter p-value modulating wrong things believed at the expense of true things not believed.
Sections should discuss abuses of statistics, one covering violations of the law (failing to actually test P(B|~A) and instead testing P((B + (some random stuff) - (some other random stuff)|~A) and another covering systemic failures such as publication bias and failure to publish replications. This would be a good place to introduce intra-scientific debates about such things to show both that science isn’t a monolithic outlook that can be supported and how one side in the civil war is aligned with Bayesian critiques. To the extent science is not settled on what the sociology of science is, that is a mark of weakness—it may be perfectly calibrated, but it isn’t too discriminatory here.
A concession I imagine pro-science people might make is to concede the weakness of soft science, such as sociology. Nonetheless, sociology’s scientific method is deeply related to hard sciences’, and its shortcomings somewhat implicate them. What’s more, if sociology is so weak, one wonders whence the pro-science person gets their strong pro-science view. One possibility is that they get it purely from philosophy of science, (a school of which) they wholly endorse, but if that is the case they don’t have an objection in kind to those who also predict science as is works decently but have severe criticisms of it and ideas on how to improve upon it, i.e. Bayesians.
I think it’s fair to contrast the scientific view of science with a philosophical view of Bayesianism to see if they are of the same scope. If science has no position on whether or not science is an approximation of Bayesian reasoning, and Bayesianism does, that is at least one question addressed by the one and not the other. It would be easy to invent a method that’s not useful for finding truth that has a broader scope than science, e.g. answering “yes” to every yes or no question unless it would contradict a previous response. This alone would show they are not synonymous.
A problem with the title “When You Can (And Can’t) Do Better Than Science” is that it is binary, but I really want three things explicitly expressed: 1) When you can do better than science by being stricter than science, 2) when you can do better than science by being more lenient than science, 3) when you can’t do better than science. The equivocation and slipperiness surrounding what it is reasonable to do is a significant part of the last category, e.g. one doesn’t drive where the Tappan Zee Bridge should have been built. The other part is near-perfect ways science operates now according to a reasonable use of “can’t”; I wouldn’t expect science to be absolutely and exactly perfect anywhere, any more than I can be absolutely sure with a probability of 1 that the Flying Spaghetti Monster doesn’t exist.
Second order Bayesianism deserves mention as the thing being advocated. A “good Bayesian” may use heuristics to counteract bias other than just Bayes’ rule, such as the principle of charity, or pretending things are magic to counteract the effort heuristic, or reciting a large number of variably sized numbers to counteract the anchoring effect, etc.
Is there a better analogy than the driving to the airport one for why Bayes’ Rule being part of the scientific toolbox doesn’t show the scientific toolbox isn’t a rough approximation of how to apply Bayes’ Rule? The other one I thought of is light’s exhibiting quantum behavior directly, it being a subset of all that is physical, but all that is physical actually embodying quantum behavior.
A significant confusion is discussing beliefs as if they weren’t probabilistic and actions in some domains as if they ought not be influenced by anything not in a category of true belief “scientifically established”. Bayesianism explains why this is a useful approximation of how one should actually act and thereby permits one to deviate from it without having to claim something like “science doesn’t work”.
Not necessarily to reopen anything, but some notes:
the placebo effect
I’m not sure it’s at all possible to debias against this.
The accepted scientific methodology is more like a safety rope or seat belt.
I agree that those are better metaphors than handcuffs all else equal, but those things would not prevent one from shooting one’s foot, and so it didn’t fit the broader metaphor.
A better analogy would be a law that no medical treatment can be received until a second opinion is obtained, or something like that.
I don’t think scientists think about it much. That’s more the sort of thing philosophers of science think about. The smarter scientists do what is essentially Bayesian updating, although very few of them would actually put a number on their prior and calculate their posterior based on a surprising p value. They just know that it takes a lot of very good evidence to overturn a well-established theory, and not so much evidence to establish a new claim consistent with the existing scientific knowledge.
Stating your hypothesis beforehand and specifying exactly what will and will not count as evidence before you collect your data is a very good way of minimising the effect of your own biases, but naughty scientists can and do take the opportunity to cook the experiment by strategically choosing what will count as evidence. Still, overall it’s better than letting scientists pore over the entrails of their experimental results and make up a hypothesis after the fact. If a great new hypothesis comes out of the data then you have do to your legwork and do a whole new experiment to test the new hypothesis, and that’s how it should be. If the effect is real it will keep. The universe won’t change on you.
It’s not a binary distinction. Rather, if you’re unaware of the ways that people’s P(B) estimates can be wildly inaccurate and think that your naive P(B) estimates are likely to be accurate then you can update into all sorts of stupid and factually false beliefs even if you’re an otherwise perfect Bayesian.
The people who think that John Edward can talk to dead people might well be perfect Bayesians who just haven’t checked to see what the probability is that John Edward could produce the effects he produces in a world where he can’t talk to dead people. If you think the things he does are improbable then it’s technically correct to update to a greater belief in the hypothesis that he can channel dead people. It’s only if you know that his results are exactly what you’d expect in a world where he’s a fake that you can do the correct thing, which is not update your prior belief that the probability that he’s a fake is 99.99...9%.
If someone’s done some actual work to see if they can falsify the null hypothesis that PUS techniques are indistinguishable from a change, a comb, a shower and asking some women out I’d be interested in seeing it. In the absence of such work I think good Bayesians have to recognise that they don’t have a P(B) with small enough error bars to be very useful.
Exactly, it’s a cost and a deviation from ideal thinking to minimize the influence of scientists who receive no training in debiasing. So not “If you’re doing Bayes right it’s the same as doing science”, where “science” is an imperfect human construct designed to accommodate the more biased of scientists.
These are costs. It’s important, and in some contexts cheap, to know why and how things work instead of saying “I’ll ignore that since enough replication always solves such problems,” when one doesn’t know in which cases one is doing nearly pointless extra work and in which one isn’t doing enough replication. It’s an obviously sub-optimal solution along the lines of “thinking isn’t important; assume infinite resources.”
It’s praise through faint damnation of the laws of logic that they don’t prevent one from shooting one’s own foot off. Handcuffs are even better at that task, but they are less useful for figuring out what is true.
Exactly, so in “some of the LW groupthink holds that you can do a valid Bayesian update in the absence of a rigorously established base rate,” they are right, and “updating is no better than guesswork in the absence of a rigorously obtained P(B),” is not always true, such as when the following condition doesn’t apply, and it doesn’t here:
What do you think this site is for? People are reading and sharing research papers about biases in their free time. One could likewise criticize jet fuel for being inappropriate for an old fashioned coal powered locomotive. Yes, jet fuel will explode a train...this is not a flaw of jet fuel, and it does not mean that the coal-train is better at transporting things.
That’s not the claim in question.
In any case, there are better ways to think about this subject than with null hypotheses. Those are social constructs focusing (decently) on optimizing preventing belief in untrue things, rather than determining what’s most likely true, here false beliefs have relatively less cost than in most of science, and will in any case only be held probabilistically.
There’s a very good reason why we do double-blind, placebo-controlled trials rather than just recruiting a bunch of people who browse LW to do experiments with, on the basis that since LWers are “trained in debiasing” they are immune to wishful thinking, confirmation bias, the experimenter effect, the placebo effect and so on.
I have a great deal more faith in methodological constructs that make it impossible for bias to have an effect than in people’s claims to “debiased” status.
Don’t get me wrong, I think that training in avoiding cognitive biases is very important because there are lots of important things we do where we don’t have the luxury of specifying our hypotheses in strictly instrumental terms beforehand, collecting data via suitably blinded proxies and analysing it just in terms of our initial hypothesis.
However my view is that if you think that scientific methodology is just a set of training wheels for people who haven’t clicked on all the sequences yet and that browsing LW makes you immune to the problems that scientific methodology exists specifically to prevent then it’s highly likely you overestimate your resistance to bias.
There’s also a cost to acting on the assumption that every correlation is meaningful in a world where we have so much data available to us that we can find arbitrarily large numbers of spurious correlations at P<0.01 if we try hard enough. Either way you’re spending resources, but spending resources in the cause of epistemological purity is okay with me. Spending resources on junk because you are not practising the correct purification rituals is not.
The accepted scientific methodology is more like a safety rope or seat belt. Sometimes annoying, almost always rational.
Rather than what a site is for I focus on what a site is.
In many, many ways this site has higher quality discourse than, say, the JREF forums and a population who on average are better versed in cognitive biases. However this discussion has made it obvious to me that on average the JREF forumites are far more aware than the LWers of the various ways that people’s estimates of P(B) can be wrong and can be manipulated.
They would never put it in those terms since Bayes is a closed book to them, but they are very well aware that you can work yourself into completely wrong positions if you aren’t sophisticated enough to correctly estimate the actual base rate at which one would expect to observe things like homeopathy apparently working, people apparently talking to the dead, people apparently having psychic powers, NLP apparently letting you seduce people and so on in worlds where none of these things did anything except act as placebos (at best).
If your P(B) is off then using Bayes Theorem is just being a mathematically precise idiot instead of an imprecise idiot. You’ll get to exactly the right degree of misguided belief, based on the degree to which you’re mistaken about the correct value of P(B,) but that’s still far worse than being someone who wouldn’t know Bayes from a bar of soap but who intuitively perceives something closer to the correct P(B).
The idea that LW browsers think they are liquid-fuelled jets while the scientists who do the actual work of moving society forward are boring old coal trains worries me. I think of LW’s “researchers” as a bunch of enthusiastic amateurs with cheap compasses and hand-drawn maps running around in the bushes in a mildly organised fashion, while scientists are painstakingly and one inch at a time building a gigantic sixteen-lane highway for us all to drive down.
Yes, and people who actually understand the tradeoffs in using formal scientific reasoning and its deviations from the laws of reasoning are the only people in position to intelligently determine that. Those who say “always use the scientific method for important things” or, though I don’t know that there ever has been or ever will be such a person, “always recruit a bunch of people who browse LW,” are not thinking any more than a broken clock is ticking. As an analogy, coal trains are superior to jet planes for transporting millions of bushels of wheat from Alberta to Toronto. It would be inane and disingenuous for broken records always calling for the use of coal trains to either proclaim their greater efficiency in determining which vehicle to use to transport things because they got the wheat case right or pretend that they have a monopoly on calling for the use of trains.
With reasoning, one can intelligently determine a situation’s particulars and spend to eliminate a bias (for example by making a study double-blind) rather than doing that all the time or relying on skill in this case,and without relying on intuition to determine when. One can see that in an area, the costs of thinking something true when it isn’t exceeds the costs of thinking it’s false when it’s true, and set up correspondingly strict protocols, rather than blindly always paying in true things not believed, time, and money for the same, sometimes inadequate and sometimes excessive, amount of skepticism.
My view is that if you think anyone who has interacted with you in this thread has that view you have poor reading comprehension skills.
So one can simply...not do that. And be a perfectly good Bayesian.
It is not the case that every expenditure reducing the likelihood that something is wrong is optimal,as instead one could instead spend a bit on determining which areas ought to have extra expenditure reducing the likelihood that something is wrong there.
In any case, science has enshrined a particular few levels of spending on junk that it declares perfectly fine because the “correct” purification rituals have been done. I do not think that such spending on junk is justified because in those cases no, science is not strict enough. One can declare a set of arbitrary standards and declare spending according to them correct and ideologically pure or similar, but as one is spending fungible resources towards research goals this is spurious morality.
Amazing, let me try one. If a Bayesian reasoner is hit by a meteor and put into a coma, he is worse off than a non-Bayesian who stayed indoors playing Xbox games and was not hit by a meteor. So we see that Bayesian reasoning is not sufficient to confer immortality and transcendence into a godlike being made of pure energy.
People on this site are well aware that if scientific studies following the same rules as the rest of science indicate that people have psychic powers, there’s something wrong with the scientific method and the scientists’ understanding of it because the notion that people have psychic powers are bullshit.
People here know that there is not some ineffable magic making science the right method in the laboratory and faith the right method in church, or science the right method in the laboratory and love the right method everywhere else, science the right method everywhere and always, etc., as would have been in accordance with people’s intuitions.
How unsurprising it is that actually understanding the benefits and drawbacks of science leads one to conclude that often science is not strict enough, and often too strict, and sometimes but rarely entirely inappropriate when used, and sometimes but rarely unused when it should be used, when heretofore everything was decided by boggling intuition.
Grammar nitpick: should be “is bullshit,” referring to the singular “notion.”
I’m not going to get into a status competition with you over who is in a position to determine what.
The most obvious interpretation of your statement that science is “an imperfect human construct designed to accommodate the more biased of scientists” and that “it’s a cost and a deviation from ideal thinking to minimize the influence of scientists who receive no training in debiasing” is that you think your LW expertise means that you wouldn’t need those safeguards. If I misinterpreted you I think it’s forgivable given your wording, but if I misinterpreted you then please help me out in understanding what you actually meant.
I’m responding under the assumption that the second “scientific” should read “psychic”. My point was not that people here didn’t get that—I imagine they all do. My point is that the evidence on the table to support PUA theories is vulnerable to all the same problems as the evidence supporting claimed psychic powers, and that when it came to this slightly harder problem some people here seemed to think that the evidence on the table for PUA was actually evidence we would not expect to see in a world where PUA was placebo plus superstition.
I think the JREF community would take one sniff of PUA and say “Looks like a scam based on a placebo”, and that they would be better Bayesians when they did so than anyone who looks at the same evidence and says “Seems legit!”.
(I suspect that the truth is that PUA has a small non-placebo effect, since we live in a universe with ample evidence that advertising and salesmanship have small non-placebo effects that are statistically significant if you get a big enough sample size. However I also suspect that PUAs have no idea which bits of PUA are the efficacious bits and which are superstition, and that they could achieve the modest gains possible much faster if they knew which was which).
OK, I will phrase it in different terms that make it explicit that I am making several claims here (one about what Bayesianism can determine, and one about what science can determine). It’s much like I said above:
Some people claim Bayesian reasoning models intelligent agents’ learning about their environments, and agents’ deviations from it is failure to learn optimally. This model encompasses choosing when to use something like the scientific method and deciding when it is optimal to label beliefs not as “X% likely to be true, 1-X% likely to be untrue,” but rather “Good enough to rely on by virtue of being satisfactorily likely to be true,” and “Not good enough to rely on by virtue of being satisfactorily likely to be true”. If Bayesianism is wrong, and it may be, it’s wrong.
The scientific method is a somewhat diverse set of particular labeling systems declaring ideas “Good enough to rely on by virtue of being satisfactorily likely to be true,” and “Not good enough to rely on by virtue of being satisfactorily likely to be true.” Not only is the scientific method incomplete by virtue of using a black-box reasoning method inside of it, it doesn’t even claim to be able to adjudicate between circumstances in which it is to be used and in which it is not to be used. It is necessarily incomplete. Scientists’ reliance on intuition to decide when to use it and when not to may well be better than using Bayesian reasoning, particularly if Bayesianism is false, I grant that. But the scientific method doesn’t, correct me if I am wrong, purport to be able to formally decide whether or not a person should subject his or her religious beliefs to it.
I disagree but here is a good example of where Bayesians can apply heuristics that aren’t first-order applications of Bayes rule. The failure mode of the heuristic is also easier to see than where science is accused of being too strict (though that’s really only a part of the total claim, the other parts are that science isn’t strict enough, that it isn’t near Pareto optimal according to its own tradeoffs in which it sacrifices truth, and that it is unfortunately taken as magical by its practitioners).
In those circumstances in which the Bayesian objection to science is that it is too strict, science can reply by ignoring that money is the unit of caring and declare its ideological purity and willingness to always sacrifice resources for greater certainty (such as when the sacrifice is withholding FDA approval of a drug already approved in Europe), “Either way you’re spending resources, but spending resources in the cause of epistemological purity is okay with me. Spending resources on junk because you are not practising the correct purification rituals is not.”
Here, however, the heuristic is “reading charitably”, in which the dangers of excess are really, really obvious. Nonetheless, even if I am wrong about what the best interpretation is, the extra-Bayesian ritual of reading (more) charitably would have had you thinking it more likely than you did that I had meant something more reasonable (and even more so, responding as if I did). It is logically possible that you were reading charitably ideally and my wording was simply terrible. This is a good example of how one can use heuristics other than Bayes’ rule once one discovers one is a human and therefore subject to bias. One can weigh the costs and benefits of it just like each feature of scientific testing.
For “an imperfect human construct designed to accommodate the more biased of scientists”, it would hardly do to assume scientists are all equally biased, and likewise for assuming the construct is optimal no matter the extent of bias in scientists. So the present situation could be improved upon by matching the social restrictions to the bias of scientists and also decreasing that bias. If mostly science isn’t strict enough, then perhaps it should be stricter in general (in many ways it should be) but the last thing to expect is that it is perfectly calibrated. It’s “imperfect”, I wouldn’t describe a rain dance as an “imperfect” method to get rain, it would be an “entirely useless” method. Science is “imperfect”, and it does very well to the extent thinking is warped to accommodate the more biased of scientists, and so something slightly different would be more optimal for the less biased ones.
″...it’s a cost and a deviation from ideal thinking to minimize the influence of scientists who receive no training in debiasing,” and less cost would be called for if they received such training, but not zero. Also, it is important to know that costs are incurred, lest evangelical pastors everywhere be correct when they declare science a “faith”. Science is roughly designed to prevent false things from being called “true” at the expense of true things not being called “true”. This currently occurs to different degrees in different sciences, and it should, and some of those areas should be stricter, and some should be less strict, and in all cases people shouldn’t be misled about belief such that they think there is a qualitative difference between a rigorously established base rate and one not so established, or science and predicting one’s child’s sickness when it vomits a certain color in the middle of the night.
It’s not too similar since psychic powers have been found in controlled scientific studies, and they are (less than infinitely, but nearly) certainly not real. PUA theories were formed from people’s observations, then people developed ideas they thought based on the theories, then tested what they thought were the ideas, tested them insufficiently rigorously. Each such idea is barely more likely than the base rate for being correct due to all the failure nodes, but each is more likely, the way barely enriched uranium’s particles are more likely to be U-235 than natural uranium’s are. This is in line with “However I also suspect that PUAs have no idea which bits of PUA are the efficacious bits and which are superstition, and that they could achieve the modest gains possible much faster if they knew which was which”.
When it comes to action, as in psychological experiments in which one is given a single amount of money for correctly guessing the color of something between red and blue, and one determines 60% of the things are red, one should always guess red, one should act upon ideas most likely true if one must act, all else equal.
Any chance of turning this (and some of your other comments) into a top-level post? (perhaps something like, “When You Can (And Can’t) Do Better Than Science”?)
Yes.
I think the first section should ignore the philosophy of science and cover the science of science, the sociology of it, and concede the sharpshooter’s fallacy, assuming that whatever science does it is trying to do. The task of improving upon the method is then not too normative, since one can simply achieve the same results with fewer resources/better results with the same resources. Also, that way science can’t blame perceived deficiencies on the methods of philosophy, as it could were one to evaluate science according to philosophy’s methods and standards. This section would be the biggest added piece of value that isn’t tying together things already on this site.
A section should look for edges with only one labeled node in the scientific methods where science requires input from a mystery method, such as how scientists generate hypotheses or how scientific revolutions occur. These show the incompleteness of the scientific method as a means to acquire knowledge, even if it is perfect at what it does. Formalization and improvement of the mystery methods would contribute to the scientific method, even if nothing formal within the model changes.
A section should discuss how science isn’t a single method (according to just about everybody), but instead a family of similar methods varying especially among fields. This weakens any claim idealizing science in general, as at most one could claim that a particular field’s method is ideal for human thought and discovery. Assuming each (or most) fields’ methods are ideal (this is the least convenient possible world for the critic of the scientific method as practiced), the costs and benefits of using that method rather than a related scientific method can be speculated upon. I expect to find, as policy debates should not be one sided, that were a field to use other fields’ methods it would have advantages and disadvantages; the simple case is choice of stricter p-value modulating wrong things believed at the expense of true things not believed.
Sections should discuss abuses of statistics, one covering violations of the law (failing to actually test P(B|~A) and instead testing P((B + (some random stuff) - (some other random stuff)|~A) and another covering systemic failures such as publication bias and failure to publish replications. This would be a good place to introduce intra-scientific debates about such things to show both that science isn’t a monolithic outlook that can be supported and how one side in the civil war is aligned with Bayesian critiques. To the extent science is not settled on what the sociology of science is, that is a mark of weakness—it may be perfectly calibrated, but it isn’t too discriminatory here.
A concession I imagine pro-science people might make is to concede the weakness of soft science, such as sociology. Nonetheless, sociology’s scientific method is deeply related to hard sciences’, and its shortcomings somewhat implicate them. What’s more, if sociology is so weak, one wonders whence the pro-science person gets their strong pro-science view. One possibility is that they get it purely from philosophy of science, (a school of which) they wholly endorse, but if that is the case they don’t have an objection in kind to those who also predict science as is works decently but have severe criticisms of it and ideas on how to improve upon it, i.e. Bayesians.
I think it’s fair to contrast the scientific view of science with a philosophical view of Bayesianism to see if they are of the same scope. If science has no position on whether or not science is an approximation of Bayesian reasoning, and Bayesianism does, that is at least one question addressed by the one and not the other. It would be easy to invent a method that’s not useful for finding truth that has a broader scope than science, e.g. answering “yes” to every yes or no question unless it would contradict a previous response. This alone would show they are not synonymous.
A problem with the title “When You Can (And Can’t) Do Better Than Science” is that it is binary, but I really want three things explicitly expressed: 1) When you can do better than science by being stricter than science, 2) when you can do better than science by being more lenient than science, 3) when you can’t do better than science. The equivocation and slipperiness surrounding what it is reasonable to do is a significant part of the last category, e.g. one doesn’t drive where the Tappan Zee Bridge should have been built. The other part is near-perfect ways science operates now according to a reasonable use of “can’t”; I wouldn’t expect science to be absolutely and exactly perfect anywhere, any more than I can be absolutely sure with a probability of 1 that the Flying Spaghetti Monster doesn’t exist.
Second order Bayesianism deserves mention as the thing being advocated. A “good Bayesian” may use heuristics to counteract bias other than just Bayes’ rule, such as the principle of charity, or pretending things are magic to counteract the effort heuristic, or reciting a large number of variably sized numbers to counteract the anchoring effect, etc.
Is there a better analogy than the driving to the airport one for why Bayes’ Rule being part of the scientific toolbox doesn’t show the scientific toolbox isn’t a rough approximation of how to apply Bayes’ Rule? The other one I thought of is light’s exhibiting quantum behavior directly, it being a subset of all that is physical, but all that is physical actually embodying quantum behavior.
A significant confusion is discussing beliefs as if they weren’t probabilistic and actions in some domains as if they ought not be influenced by anything not in a category of true belief “scientifically established”. Bayesianism explains why this is a useful approximation of how one should actually act and thereby permits one to deviate from it without having to claim something like “science doesn’t work”.
Thoughts?
Not necessarily to reopen anything, but some notes:
I’m not sure it’s at all possible to debias against this.
I agree that those are better metaphors than handcuffs all else equal, but those things would not prevent one from shooting one’s foot, and so it didn’t fit the broader metaphor.
A better analogy would be a law that no medical treatment can be received until a second opinion is obtained, or something like that.