I suppose it could be interpreted that way. It’s not like anyone wants research to shut down—everyone agrees that research should continue. We’ve talked about much more divisive things in the past.
Einstein was more productive when it comes to producing scientific breakthrough when he worked in a patent office in 1905 than when he was having big grants.
It’s not at all clear whether writing grants to run fancy experiments and then publishing papers that don’t replicate to have a high enough publication rate to get further grants helps the scientific project.
Bad example. Einstein was A) doing physics at a time when the size of budgets needed to make new discoveries was much smaller B) primarily doing theoretical work or work that relied on other peoples data. Many areas of research (e.g. much of particle physics, a lot of condensed matter, most biology) require funding for the resources to simply to do anything at all.
A) doing physics at a time when the size of budgets needed to make new discoveries was much smaller
I don’t think that true.
If you take something like the highly useful discovery that taking vitamin D at the morning is more effective than at the evening that discovery was made in the last decade by amateurs without budjets.
Fermi estimates aren’t easy but that discovery might be worth a year of lifespan. If you look at what the Google people are saying solving cancer is worth three years of lifespan. The people who publish breakthrough results in cancer research have replication rates of under 10 percent.
Just as Petrov didn’t get a nobel peace price, the people advancing human health don’t get biology nobel prices.
Relying on other people’s data is much easier know that it was in Einsteins time. Open science doesn’t go as far as I would like but being able to transfer data easily via computers makes things so much easier.
The fact that most work in biology relies on experiments suggests that there are not enough people doing good theoretical work in the field it. I don’t know much about particle phyiscs but I’m not sure whether we need as much smart people doing particle physics as we have at the moment.
So there are two distinct arguments being made: one is a resource allocation argument (it would be better to spend fewer resources right now on things like particle physics) and the second argument is that in many fields one can still make discoveries with few resources. The first argument may have some validity. The second argument ignores how much work is required in most cases. Yes, one can do things like investigate specific vitamin metabolism issues. But if one is interested in say synthesizing new drugs, or investigating how those drugs would actually impact people that requires large scale experiments.
The fact that most work in biology relies on experiments suggests that there are not enough people doing good theoretical work in the field it.
That’s not what is going on here. The issue is that biology is complicated. Life doesn’t have easy systems that have easy theoretical underpinnings that can be easily computed. There are literally thousands of distinct chemicals in a cell interacting, and when you introduce a new one, even if you’ve designed it to interact with a specific receptor, it will often impact others. And even if it does only impact the receptor in question, how it does so will matter. You are dealing with systems created by the blind-idiot god.
You are defending a way of doing biology that plagued by various problems. It’s a field where people literally believe that they can perceive more when they blind themselves.
There are huge issues in the theoretically underpinning of that approach because the people in the system are too busy writing research that doesn’t replicate for top tier journals that requires expensive equipment instead of thinking more about how to approach the field.
So every field has problems, but that doesn’t mean those problems are “huge”.
There are huge issues in the theoretically underpinning of that approach because the people in the system are too busy writing research that doesn’t replicate for top tier journals that requires expensive equipment instead of thinking more about how to approach the field.
Outside view: An entire field which is generally pretty successful at actually finding what is going on is fundamentally misguided about how they should be approaching the field, or the biologists are doing what they can. Biology is hard. But we are making progress in biology at a rapid rate. For example, the use of genetic markers to figure out how to treat different cancers was first proposed in the early 1990s and is now a highly successful clinical method.
For example, the use of genetic markers to figure out how to treat different cancers was first proposed in the early 1990s and is now a highly successful clinical method.
Really? Can you point to a paper demonstrating it’s better than classifying cancers the way histologists did in the 80s? Everything I’ve seen says that it just reconstructs the same classification. But it took ten years for the geneticists to admit that. I’ve seen more recent genetic classification that might be better than the old ones, but they didn’t bother to compare to the old genetic classifications, let alone the histology.
HER2 receptor. These days those with breast cancer that overexpresses this growth factor receptor tend to get monoclonal antibodies against it, which both suppress its growth effects and tag it for disruption by the immune system.
Yes, this is a protein test rather than a genetic test. But it lets the subset of people with this amplification get a treatment that has a large positive absolute effect on those with early-stage cancer.
I don’t know enough about that subfield to answer that question. If what you are saying is accurate, that’s highly disturbing. Most of my exposure to that subfield has been to popular press articles such as this one which paint a picture that sounds much more positive, but may well be highly distorted from what’s actually going on.
But we are making progress in biology at a rapid rate. For example, the use of genetic markers to figure out how to treat different cancers was first proposed in the early 1990s and is now a highly successful clinical method.
That’s a crude method of measuring success.
The cost of new drugs rises exponentially via Eroom’s law. Big Pharma constantly lays of people.
A problem like obesity grows worse over the years instead of progress. Diabetes gets worse.
Even if you say that science isn’t about solving real world issues but about knowledge, I also think that replication rates of 11% in the case of breakthrough cancer research indicates that the field is not good at finding out what’s going on.
Even if you say that science isn’t about solving real world issues but about knowledge, I also think that replication rates of 11% in the case of breakthrough cancer research indicates that the field is not good at finding out what’s going on.
I don’t think a flat replication rate of 11% tells us anything without recourse to additional considerations. It’s sort of like a Umeshism: if your experiments are not routinely failing, you aren’t really experimenting. The best we can say is that 0% and 100% are both suboptimal...
For example, if I was told that anti-aging research was having a 11% replication rate for its ‘stopping aging’ treatments, I would regard this as shockingly too high and a collective crime on par with the Nazis, and if anyone asked me, would tell them that we need to spend far far more on anti-aging research because we clearly are not trying nearly enough crazy ideas. And if someone told me the clinical trials for curing balding were replicating at 89%, I would be a little uneasy and wonder what side-effects we were exposing all these people to.
(Heck, you can’t even tell much about the quality of the research from just a flat replication rate. If the prior odds are 1 in 10,000, then 11% looks pretty damn good. If the prior odds are 1 in 5, pretty damn bad.)
What I would accept as a useful invocation of an 11% rate is, say, an economic analysis of the benefits showing that this represents over-investment (for example, falling pharmacorp share prices) or surprise by planners/scientists/CEOs/bureaucrats where they had held more optimistic assumptions (and so investment is likely being wasted). That sort of thing.
Replication rate of experiments is quite different from the success rate of experiments.
An 11% success rate is often shockingly high. An 11% replication rate means the researchers are sloppy, value publishing over confidence in the results, and likely do way too much of throwing spaghetti at the wall...
Even granting your distinction, the exact same argument still applies: just substitute in an additional rate of, say, 10% chance of going from replication to whatever you choose to define as ‘success’. You cannot say that a 11% replication rate and then a 1.1% success rate is optimal—or suboptimal—without doing more intellectual work!
No, I don’t think so. An 11% replication rate means that 89% of the published results are junk and external observers have no problems seeing that. Which implies that if those who published it were a bit more honest/critical/responsible, they should have been able to do a better job of controlling for the effects which lead them to think there’s statistical significance when in fact there’s none.
If the prior odds are 1:10,000 you have no business publishing results at 0.05 confidence level.
An 11% replication rate means that 89% of the published results are junk and external observers have no problems seeing that.
Yes, so? As Edison said, I have discovered 999 ways to not build a lightbulb.
Which implies that if those who published it were a bit more honest/critical/responsible, they should have been able to do a better job of controlling for the effects which lead them to think there’s statistical significance when in fact there’s none.
Huh? No. As I already said, you cannot go from replication rate to judgment of the honesty, competency, or insight of researchers without additional information. Most obviously, it’s going to be massively influenced by the prior odds of the hypotheses.
If the prior odds are 1:10,000 you have no business publishing results at 0.05 confidence level.
No one has any business publishing at an arbitrary confidence level, which should be chosen with respect to some even half-assed decision analysis. 1:10,000 or 1:1000, doesn’t matter.
As Edison said, I have discovered 999 ways to not build a lightbulb.
You’re still ignoring the difference between a failed experiment and a failed replication.
Edison did not publish 999 papers each of them claiming that this is the way to build the lightbulb (at p=0.05).
you cannot go from replication rate to judgment of the honesty, competency, or insight of researchers without additional information. Most obviously, it’s going to be massively influenced by the prior odds of the hypotheses.
And what exactly prevents the researchers from considering the prior odds when they are trying to figure out whether their results are really statistically significant?
I disagree with you—if a researcher consistently publishes research that cannot be replicated I will call him a bad researcher.
You’re still ignoring the difference between a failed experiment and a failed replication. Edison did not publish 999 papers each of them claiming that this is the way to build the lightbulb (at p=0.05).
So? What does this have to do with my point about optimizing return from experimentation?
And what exactly prevents the researchers from considering the prior odds when they are trying to figure out whether their results are really statistically significant?
Nothing. But no one does that because to point out that a normal experiment has resulted in a posterior probability of <5% is not helpful since that could be said of all experiments, and to run a single experiment so high-powered that it could single-handedly overcome the prior probability is ludicrously wasteful. You don’t run a $50m clinical trial enrolling 50,000 people just because some drug looks interesting.
I disagree with you—if a researcher consistently publishes research that cannot be replicated I will call him a bad researcher.
I think our disagreement comes (at least partially) from the different views on what does publishing research mean.
I see your position as looking on publishing as something like “We did A, B, and C. We got the results X and Y. Take it for what it is. The end.”
I’m looking on publishing more like this: “We did multiple experiments which did not give us the magical 0.05 number so we won’t tell you about them. But hey, try #39 succeeded and we can publish it: we did A39, B39, and C39 and got the results X39 and Y39. The results are significant so we believe them to be meaningful and reflective of actual reality. Please give our drug to your patients.”
The realities of scientific publishing are unfortunate (and yes, I know of efforts to ameliorate the problem in medical research). If people published all their research (“We did 50 runs with the following parameters, all failed, sure #39 showed statistical significance but we don’t believe it”) I would have zero problems with it. But that’s not how the world currently works.
P.S. By the way, here is some research which failed replication (via this)
The realities of scientific publishing are unfortunate (and yes, I know of efforts to ameliorate the problem in medical research). If people published all their research (“We did 50 runs with the following parameters, all failed, sure #39 showed statistical significance but we don’t believe it”) I would have zero problems with it. But that’s not how the world currently works.
That would be a better world. But in this world, it would still be true that there is no universal, absolute, optimal percentage of experiments failing to replicate, and the optimal percentage is set by decision-theoretic/economic concerns.
Experiments that fail to replicate at percentages greater than those expected from published confidence values (say, posterior probabilities) are evidence that the published confidence values are wrong.
A research process that consistently produces wrong confidence values has serious problems.
Experiments that fail to replicate at percentages greater than those expected from published confidence values (say, posterior probabilities) are evidence that the published confidence values are wrong.
How would you know? People do not produce posterior probabilities or credible intervals, they produce confidence intervals and p-values.
Either the p-values in the papers are worthless in the sense of not reflecting the probability that the observed effect is real—in which case the issue in the parent post stands.
Or the p-values, while not perfect, do reflect the probability the effect is real—in which case they are falsified by the replication rates and in which case the issue in the parent post stands.
Either the p-values in the papers are worthless in the sense of not reflecting the probability that the observed effect is real
p-values do not reflect the probability that the observed effect is real but the inverse, and no one has ever claimed that, so we can safely dismiss this entire line of thought.
Or the p-values, while not perfect, do reflect the probability the effect is real
p-values can, with some assumptions and choices, be used to calculate other things like positive predictive value/PPV, which are more meaningful. However, the issue still stands. Suppose a field’s studies have a PPV of 20%. Is this good or bad? I don’t know—it depends on the uses you intend to put it to and the loss function on the results.
Maybe it would be helpful if I put it in Bayesian terms where the terms are more meaningful & easier to understand. Suppose an experiment turns in a posterior with 80% of the distribution >0. Subsequent experiments or additional data collection will agree with and ‘replicate’ this result the obvious amount.
Now, was this experiment ‘underpowered’ (it collected too little data and is bad) or ‘overpowered’ (too much and inefficient/unethical) or just right? Was this field too tolerant of shoddy research practices in producing that result?
Well, if the associated loss function has a high penalty on true values being <0 (because the cancer drugs have nasty side-effects and are expensive and only somewhat improve on the other drugs) then it was probably underpowered; if it has a small loss function (because it was a website A/B test and you lose little if it was a worse variant) then it was probably overpowered because you spent more traffic/samples than you had to to choose a variant.
The ‘replication crises’ are a ‘crisis’ in part because people are basing meaningful decisions on the results to an extent that cannot be justified if one were to explicitly go through a Bayesian & decision theory analysis with informative data. eg pharmacorps probably should not be spending millions of dollars to buy and do preliminary trials on research which is not much distinguishable from noise, as they have learned to their intense frustration & financial cost, to say nothing of diet research. If the results did not matter to anyone, then it would not be a big deal if the PPV were 5% rather than 50%: the researchers would cope, and other people would not make costly suboptimal decisions.
There is no single replication rate which is ideal for cancer trials and GWASes and individual differences psychology research and taxonomy and ecology and schizophrenia trials and...
It isn’t a metric of success. It is an example, one of many in the biological sciences.
The cost of new drugs rises exponentially via Eroom’s law.
This is likely due largely to policy issues and legal issues more than it is how the biologists are thinking. Clinical trials have gotten large.
A problem like obesity grows worse over the years instead of progress. Diabetes gets worse.
A systemic problem, but one that has even less to do with biological research than Eroom’s law. Obesity is not due to a lack of theoretical underpinnings in biology.
Even if you say that science isn’t about solving real world issues but about knowledge, I also think that replication rates of 11% in the case of breakthrough cancer research indicates that the field is not good at finding out what’s going on.
The question isn’t is the field very good. The question is are the problems which we both agree exist due at all to not enough theory? File drawer effects, cognitive biases, bad experimental design are all issues here, none of which fall into that category.
It isn’t a metric of success. It is an example, one of many in the biological sciences.
Then at what grounds do you claim that the field is succesful? How would you know if it weren’t succesful?
Obesity is not due to a lack of theoretical underpinnings in biology.
I’m not saying that theory lacks theoretical underpinnings but that the underpinning is of bad quality.
The question isn’t is the field very good. The question is are the problems which we both agree exist due at all to not enough theory? File drawer effects, cognitive biases, bad experimental design are all issues here, none of which fall into that category.
Question about designing experiments in a way that they produce reproduceable results instead of only large p values are theoretical issues.
The question is are the problems which we both agree exist due at all to not enough theory?
Enough theory sounds like as attempt to quantify the amount of theory. That’s not what I advocate. Theories don’t get better through increase in their quantity. Good theoretical thinking can simply model and result in less complex theory.
Then at what grounds do you claim that the field is succesful? How would you know if it weren’t succesful?
That’s a good question, but in this context, seeing a variety of novel discoveries in the last few years indicates a somewhat successful field. By the same token, I’m curious what makes you think this isn’t a successful field?
Question about designing experiments in a way that they produce reproduceable results instead of only large p values are theoretical issues.
I’ve already mentioned the file drawer problem. I’m curious, do you think that is a theoretical problem? If so, this may come down in part due to a very different notion of what theory means.
Theories don’t get better through increase in their quantity. Good theoretical thinking can simply model and result in less complex theory.
You seem to be treating biology to some extent like it is physics, But these are complex systems. What makes you think that such approaches will be at all successful?
That’s a good question, but in this context, seeing a variety of novel discoveries in the last few years indicates a somewhat successful field. By the same token, I’m curious what makes you think this isn’t a successful field?
The fact that Big Pharma has to lay of a lot of scientists is a real world indication that the output of model of finding a drug target, screening thousands of components against it, runs those components through clinical trials to find whether they are any good and then coming out with drugs that cure important illnesses at the other end stops producing results. Eroom’s law.
I’ve already mentioned the file drawer problem. I’m curious, do you think that is a theoretical problem?
Saying that there’s a file drawer problem is quite easy. That however not a solution. I think your problem is that you can’t imaging a theory that would solve the problem. That’s typical. If it would be easy to imagine a theoretical breakthrough beforehand it wouldn’t be much of a breakthrough.
Look at a theoretical breakthrough of moving from the model of numbers as IV+II=VI to 4+2=6. If you would have talked with a Pythagoras he probably couldn’t imaging a theoretical breakthrough like that.
You seem to be treating biology to some extent like it is physics, But these are complex systems. What makes you think that such approaches will be at all successful?
I don’t. I don’t know much about physics. Paleo/Quantified Self people found the thing with Vitamin D in the morning through phenemology. The community is relatively small and the amount of work that’s invested into the theoretical underpinning is small.
I think in my exposure with the field of biology from various angles that there are a lot of areas where things aren’t clear and there room for improvement on the level on epistomolgy and ontology.
I just recently preordered two angel sensors from crowdsourcing website indiegogo. I think that the money that the company gets will do much more to advance medicine than the average NHI grant.
The fact that Big Pharma has to lay of a lot of scientists is a real world indication that the output of model of finding a drug target, screening thousands of components against it, runs those components through clinical trials to find whether they are any good and then coming out with drugs that cure important illnesses at the other end stops producing results.
This seems like extremely weak evidence. Diminishing marginal returns is a common thing in many areas. For example, engineering better trains happened a lot in the second half 19th century and the early 20th century. That slowed down, not because of some lack of theoretical background, but because the technology reached maturity. Now, improvements in train technology do occur, but slowly.
Saying that there’s a file drawer problem is quite easy. That however not a solution. I think your problem is that you can’t imaging a theory that would solve the problem. That’s typical. If it would be easy to imagine a theoretical breakthrough beforehand it wouldn’t be much of a breakthrough.
On the contrary. We have ways of handling the file drawer problem, and they aren’t theory based issues. Pre-registration of studies works. It isn’t even clear to me what it would mean to have a theoretical solution of the file drawer problem given that it is a problem about how culture, and a type of problem exists in any field. It makes about as much sense to talk about how having better theory could somehow solve type I errors.
Look at a theoretical breakthrough of moving from the model of numbers as IV+II=VI to 4+2=6. If you would have talked with a Pythagoras he probably couldn’t imaging a theoretical breakthrough like that.
The ancient Greeks used the Babylonian number system and the Greek system. They did not use Roman numerals.
It isn’t even clear to me what it would mean to have a theoretical solution of the file drawer problem given that it is a problem about how culture, and a type of problem exists in any field.
The file drawer problem is about an effect. If you can estimate exactly how large the effect is when you look at the question of whether to take a certain drug you solve the problem because you can just run the numbers.
On the contrary. We have ways of handling the file drawer problem, and they aren’t theory based issues. Pre-registration of studies works.
The concept of the file drawer problem first appeared in 1976 if I can trust google ngrams.
How much money do you think it cost to run the experiments to come up with the concept of the file drawer problem and the concept pre-registration of studies?
I don’t think that’s knowledge that got created by running expensive experiments. It came from people engaging in theoretical thinking.
It makes about as much sense to talk about how having better theory could somehow solve type I errors.
Type I errors are a feature of frequentist statistics. If you don’t use null hypotheses you don’t make type I errors. Bayesians don’t make type I errors because they don’t have null hypotheses.
How much money do you think it cost to run the experiments to come up with the concept of the file drawer problem and the concept pre-registration of studies? I don’t think that’s knowledge that got created by running expensive experiments. It came from people engaging in theoretical thinking.
After some background about NHST on page 1, Sterling immediately begins tallying tests of significance in a years’ worth of 4 psychology journals, on page 2, and discovers that eg of 106 tests, 105 rejected the null hypothesis. On page 3, he discusses how this bias could come about.
So at least in this very early discussion of publication bias, it was driven by people engaged in empirical thinking.
After some background about NHST on page 1, Sterling immediately begins tallying tests of significance in a years’ worth of 4 psychology journals, on page 2, and discovers that eg of 106 tests, 105 rejected the null hypothesis. On page 3, he discusses how this bias could come about.
I think doing a literature review is engaging in using other people data. For the sake of this discussion JoshuaZ claimed that Einstein was doing theoretical work when he worked with other people’s data.
If I want to draw information from a literature review to gather insights I don’t need expensive equipment. JoshuaZ claimed that you need expensive equipement to gather new insights in biology. I claim that’s not true.
I claim that there enough published information that’s not well organised into theories that you can make major advances in biology without needing to buy any equipment.
As far as I understand you don’t run experiments on participants to see whether Dual ‘n’ back works. You simply gather Dual ‘n’ back data from other people and tried doing it yourself to know how it feel like.
That’s not expensive. You don’t need to write large grants to get a lot of money to do that kind of work.
You do need some money to pay your bills. Einstein made that money through being a patent clerk. I don’t know how you make your money to live. Of course you don’t have to tell and I respect if that’s private information.
For all I know you could be making money by being a patent clerk like Einstein.
A scientists who can’t work on his grant projects because he of the government shutdown could use his free time to do the kind of work that you are doing.
If you don’t like the label “theoretic” that’s fine. If you want to propose a different label that distinguish your approach from the making fancy expensive experiments approach I’m open to use another label.
I think in the last decades we had an explosion in the amount of data in biology. I think that organising that data into theories lags behind. I think it takes less effort to advance biology by organising into theories and to do a bit of phenomenology than to push for further for expensive equipment produced knowledge.
I claim that there enough published information that’s not well organised into theories that you can make major advances in biology without needing to buy any equipment.
This can be true but also suboptimal. I’m sure that given enough cleverness and effort, we could extract a lot of genetic causes out of existing SNP databases—but why bother when we can wait a decade and sequence everyone for $100 a head? People aren’t free, and equipment both complements and substitutes for them.
As far as I understand you don’t run experiments on participants to see whether Dual ‘n’ back works. You simply gather Dual ‘n’ back data from other people and tried doing it yourself to know how it feel like. That’s not expensive. You don’t need to write large grants to get a lot of money to do that kind of work.
I assume you’re referring to my DNB meta-analysis? Yes, it’s not gathering primary data—I did think about doing that early on, which is why I carefully compiled all anecdotes mentioning IQ tests in my FAQ, but I realized that between the sheer heterogeneity, lack of a control group, massive selection effects, etc, the data was completely worthless.
But I can only gather the studies into a meta-analysis because people are running these studies. And I need a lot of data to draw any kind of conclusion. If n-back studies had stopped in 2010, I’d be out of luck, because with the studies up to 2010, I can exclude zero as the net effect, but I can’t make a rigorous statement about the effect of passive vs active control groups. (In fact, it’s only with the last 3 or 4 studies that the confidence intervals for the two groups stopped overlapping.) And these studies are expensive. I’m corresponding with one study author to correct the payment covariate, and it seems that on average participants were paid $600 - and there were 40, so they blew $24,000 just on paying the subjects, never mind paying for the MRI machine, the grad students, the professor time, publication, etc. At this point, the total cost of the research must be well into the millions of dollars.
It’s true that it’s a little irritating that no one has published a meta-analysis on DNB and that it’s not that difficult for a random person like myself to do it, it requires little in the way of resources—but that doesn’t change the fact that I still needed these dozens of professionals to run all these very expensive experiments to provide grist for the mill.
To go way up to Einstein, he was drawing on a lot of expensive data like that which showed the Mercury anomaly, and then was verified by very expensive data (I shudder to think how much those expeditions must have cost in constant dollars). Without that data, he would just be another… string theorist. Not Einstein.
You do need some money to pay your bills. Einstein made that money through being a patent clerk. I don’t know how you make your money to live. Of course you don’t have to tell and I respect if that’s private information. For all I know you could be making money by being a patent clerk like Einstein.
Not by being a patent clerk, no. :)
A scientists who can’t work on his grant projects because he of the government shutdown could use his free time to do the kind of work that you are doing.
To a very limited extent. There has to be enough studies to productively review, and there has to be no existing reviews you’re duplicating. To give another example: suppose I had been furloughed and wanted to work on a creatine meta-analysis. I get as far as I got now—not that hard, maybe 10 hours of work—and I realize there’s only 3 studies. Now what? Well, what I am doing is waiting a few months for 2 scientists to reply, and then I’ll wait another 5 or 10 years for governments to fund more psychology studies which happen to use creatine. But in no way can I possibly “finish” this even given months of government-shutdown-time.
I think in the last decades we had an explosion in the amount of data in biology. I think that organising that data into theories lags behind. I think it takes less effort to advance biology by organising into theories and to do a bit of phenomenology than to push for further for expensive equipment produced knowledge.
I don’t think that’s a stupid or obviously incorrect claim, but I don’t think it’s right. Equipment is advancing fast (if not always as fast as my first example of genotyping/sequencing), so it’d be surprising to me if you could do more work by ignoring potential new data and reprocessing old work, and more generally, even though stuff like meta-analysis is accessible to anyone for free (case in point: myself), we don’t see anyone producing any impressive discoveries. Case in point: more than a few researchers already believed n-back might be an artifact of the control groups before I started my meta-analysis—my results are a welcome confirmation, not a novel discovery; or to use your vitamin D example, yes, it’s cool that we found an effect of vitamin D on sleep (I certainly believe it), but the counterfactual of “QS does not exist” is not “vitamin D’s effect on sleep goes unknown” but “Gominak discovers the effect on her patients and publishes a review paper in 2012 arguing that vitamin D affects sleep”.
Type I errors are a feature of frequentist statistics. If you don’t use null hypotheses you don’t make type I errors. Bayesians don’t make type I errors because they don’t have null hypotheses.
LOL. That’s, um, not exactly true.
Let’s take a new drug trial. You want to find out whether the drug has certain (specific, detectable) effects. Could you please explain how a Bayesian approach to the results of the trial would make it impossible to make a Type I error, that is, a false positive: decide that the drug does have effects while in fact it does not?
The output of a bayesian analysis isn’t a truth value but a probability.
So is the output of a frequentist analysis.
However real life is full of step functions which translate probabilities into binary decisions. The FDA needs to either approve the drug or not approve the drug.
Saying “I will never make a Type I error because I will never make a hard decision” doesn’t look good as evidence for the superiority of Bayes...
However real life is full of step functions which translate probabilities into binary decisions.
Decisions are not the result of statistical test but of utility functions.
A bayesian takes the probability that he gets from his statistics and puts that into his utility function.
Type I errors are a feature of statistical tests and not one of decision functions.
There a difference between asking yourself: “Does this drug work better than other drugs?” and then deciding based on the answer to that question whether or not to approve the drug and asking “What’s the probability that the drug works?” and making a decision based on it.
In practice the FDA does ask their statistical tools “Does this drug work better than other drugs?” and then decides on that basis whether to approve the drug.
Why is that a problem? Take an issue like developing new antibiotica. Antibiotica are an area where there a consensus that not enough money goes into developing new ones. The special needs comes out of the fact that bacteria can develop resistance to drugs.
A bayesian FDA could just change the utility factor that goes to calculate the value of approving a new antibiotica medicament.
Skipping the whole “Does this drug work?”- question and instead of focusing on the question “What’s the expected utility from approving the drug?”
The bayesian FDA could get a probability value that the drug works from the trial and another number to quantify the seriousness of sideeffects. Those numbers can go together into a utility function for making a decision.
Developing a good framework which the FDA could use to make such decisions would be theoretical work.
The kind of work in which not enough intellectual effort goes because scientists rather want to play with fancy equipment.
If the FDA would publish utility values for the drugs that it approves that would also help insurance companies.
A insurance company could sell you an insurance that pays for drugs that exceed a certain utility value for a cerain price.
You could simply factor the file drawer effect into such a model. If a company preregisters a trial and doesn’t publish it the utility score of the drug goes down.
Preregistered trials count more towards the utility of the drug than trials with aren’t preregistered so you create an incentive for registration.
You can do all sorts of thinks when you think about designing an utility function that goes beyond (“Does this drug work better than existing ones”(Yes/No”) and “Is it safe?”(Yes/No)).
You can even ask whether the FDA should do approval at all. You can just allow all drugs but say that insurance only pays for drugs with a certain demonstrated utility score. Just pay the Big Pharma more for drugs that have high demonstrated utility.
There you have a model of an FDA that wouldn’t do any Type I errors.
I solved the basis of a theoretical problem that JoshuaZ considered insolveable in an afternoon.
*I would add that if you want to end the war on drugs, this propsal matters a lot. (Details left as exercise for the reader)
Consider Alice and Bob. Alice is a mainstream statistician, aka a frequentist. Bob is a Bayesian.
We take our clinical trial results and give them to both Alice and Bob.
Alice says: the p-value for the drug effectiveness is X. This means that there is X% probability that the results we see arose entirely by chance while the drug has no effect at all.
Bob says: my posterior probability for drug being useless is Y. This means Bob believes that there is (1-Y)% probability that drug is effective and Y% probability that is has no effect.
Given that both are competent and Bob doesn’t have strong priors X should be about the same as Y.
Do note that both Alice and Bob provided a probability as the outcome.
Now after that statistical analysis someone, let’s call him Trent, needs to make a binary decision. Trent says “I have a threshold of certainty/confidence Z. If the probability of the drug working is greater than Z, I will make a positive decision. If it’s lower, I will make a negative decision”.
Alice comes forward and says: here is my probability of the drug working, it is (1-X).
Bob comes forward and says: here is my probability of the drug working, it is (1-Y).
So, you’re saying that if Trent relies on Alice’s number (which was produced in the frequentist way) he is in danger of committing a Type I error. But if Trent relies on Bob’s number (which was produced in the Bayesian way) he cannot possibly commit a Type I error. Yes?
And then you start to fight the hypothetical and say that Trent really should not make a binary decision. He should just publish the probability and let everyone make their own decisions. Maybe—that works in some cases and doesn’t work in others. But Trent can publish Alice’s number, and he can publish Bob’s number—they are pretty much the same and both can be adequate inputs into some utility function. So where exactly is the Bayesian advantage?
Given that both are competent and Bob doesn’t have strong priors X should be about the same as Y.
Why? X is P(results >= what we saw | effect = 0), whereas Y is P(effect < costs | results = what we saw). I can see no obvious reason those would be similar, not even if we assume costs = 0; p(results = what we saw | effect = 0) = p(effect = 0 | results = what we saw) iff p_{prior}(result = what we saw) = p_{prior}(effect = 0) (where the small p’s are probability densities, not probability masses), but that’s another story.
You have two samples: one was given the drug, the other was given the placebo. You have some metric for the effect you’re looking for, a value of interest.
The given-drug sample has a certain distribution of the values of your metric which you model as a random variable. The given-placebo sample also has a distribution of these values (different, of course) which you also model as a random variable.
The statistical questions are whether these two random variables are different, in which way, and how confident you are of the answers.
For simple questions like that (and absent strong priors) the frequentists and the Bayesians will come to very similar conclusions and very similar probabilities.
For simple questions like that (and absent strong priors) the frequentists and the Bayesians will come to very similar conclusions and very similar probabilities.
Yes, but the p-value and the posterior probability aren’t even the same question, are they?
Alice says: the p-value for the drug effectiveness is X. This means that there is X% probability that the results we see arose entirely by chance while the drug has no effect at all.
No. You don’t understand null hypotheis testing. It doesn’t measure whether the results arose entirely by chance. It measures whether a specifc null hypothsis can be rejected.
I hate to disappoint you, but I do understand null hypothesis testing. In this particular example the specific null hypothesis is that the drug has no effect and therefore all observable results arose entirely by chance.
You are really determined to fight they hypothetical, aren’t you? :-) Let me quote myself with the relevant part emphasized: “You want to find out whether the drug has certain (specific, detectable) effects.”
I could simply run n=1 experiments
And how would they help you? There is the little issue of noise. You cannot detect any effects below the noise floor and for n=1 that floor is going to be pretty high.
“You want to find out whether the drug has certain (specific, detectable) effects.”
A p-value isn’t the probability that a drug has certain (specific, detectable) effects. 1-p isn’t either.
You are really determined to fight they hypothetical, aren’t you?
No, I’m accepting it. The probability of a drug having zero effects is 0. If your statistics give you an answer that a drug has a probability other than 0 for a drug having zero effects your statistics are wrong.
I think your answer suggests the idea that an experiment might provide actionable information.
And how would they help you? There is the little issue of noise. You cannot detect any effects below the noise floor and for n=1 that floor is going to be pretty high.
But you still claim that every experiment provides an actionable probability when interpreted by a frequentist.
If you give a bayesian your priors and then get a posterior probability from the bayesian that probability is in every case actionable.
Again: the probability that a drug has no specific, detectable effects is NOT zero.
I don’t care about detectability when I take a drug. I care about whether it helps me.
I want a number that tell me the probability of the drug helping me. I don’t want the statisician to answer a different question.
Detectability depends on the power of a trial.
If a frequentist gives you some number after he analysed an experiment you can’t just fit that number in a decision function.
You have to think about issues such as whether the experiment had enough power to pick up an effect.
If a bayesian gives you a probability you don’t have to think about such issues because the bayesian already integrates your prior knowledge. The probability that the bayesian gives you can be directly used.
Drug trials are neither designed to, nor capable of answering questions like this.
Whether a drug will help you is a different probability that comes out of a complicated evaluation for which the drug trial results serve as just one of the inputs.
If a bayesian gives you a probability you don’t have to think about such issues
Whether a drug will help you is a different probability that comes out of a complicated evaluation for which the drug trial results serve as just one of the inputs.
That evaluation is in it’s nature bayesian. Bayes rule is about adding together different probabilities.
At the moment there no systematic way of going about it. That’s where theory development is needed. I would that someone like the FDA writes down all their priors and then provides some computer analysis tool that actually calculates that probability.
I am sorry, you’re speaking nonsense.
If the priors are correct then a correct bayesian analysis provides me exactly the probability in which I should believe after I read the study.
That’s a good question, but in this context, seeing a variety of novel discoveries in the last few years indicates a somewhat successful field.
No, seeing a bunch of novel true discoveries indicates a successful field. However, it’s normally hard to independently verify the truth of novel discoveries except in cases where those discoveries have applications.
This seems like a nitpick more than a serious remark: obviously one is talking about the true discoveries, and giving major examples of them in biology is not at all difficult. The discovery of RNA interference is in the biochem end of things, while a great number of discoveries have occurred in paleontology as well as using genetics to trace population migrations (both humans and non-humans).
it’s normally hard to independently verify the truth of novel discoveries except in cases where those discoveries have applications.
So one question here is, for what types of discoveries is your prior high that the discovery is bogus? And how will you tell? General skepticism probably makes sense for a lot of medical “breakthroughs” but there’s a lot of biology other than those.
Einstein was more productive when it comes to producing scientific breakthrough when he worked in a patent office in 1905 than when he was having big grants.
It’s not at all clear whether writing grants to run fancy experiments and then publishing papers that don’t replicate to have a high enough publication rate to get further grants helps the scientific project.
Bad example. Einstein was A) doing physics at a time when the size of budgets needed to make new discoveries was much smaller B) primarily doing theoretical work or work that relied on other peoples data. Many areas of research (e.g. much of particle physics, a lot of condensed matter, most biology) require funding for the resources to simply to do anything at all.
I don’t think that true.
If you take something like the highly useful discovery that taking vitamin D at the morning is more effective than at the evening that discovery was made in the last decade by amateurs without budjets.
Fermi estimates aren’t easy but that discovery might be worth a year of lifespan. If you look at what the Google people are saying solving cancer is worth three years of lifespan. The people who publish breakthrough results in cancer research have replication rates of under 10 percent. Just as Petrov didn’t get a nobel peace price, the people advancing human health don’t get biology nobel prices.
Relying on other people’s data is much easier know that it was in Einsteins time. Open science doesn’t go as far as I would like but being able to transfer data easily via computers makes things so much easier.
The fact that most work in biology relies on experiments suggests that there are not enough people doing good theoretical work in the field it. I don’t know much about particle phyiscs but I’m not sure whether we need as much smart people doing particle physics as we have at the moment.
So there are two distinct arguments being made: one is a resource allocation argument (it would be better to spend fewer resources right now on things like particle physics) and the second argument is that in many fields one can still make discoveries with few resources. The first argument may have some validity. The second argument ignores how much work is required in most cases. Yes, one can do things like investigate specific vitamin metabolism issues. But if one is interested in say synthesizing new drugs, or investigating how those drugs would actually impact people that requires large scale experiments.
That’s not what is going on here. The issue is that biology is complicated. Life doesn’t have easy systems that have easy theoretical underpinnings that can be easily computed. There are literally thousands of distinct chemicals in a cell interacting, and when you introduce a new one, even if you’ve designed it to interact with a specific receptor, it will often impact others. And even if it does only impact the receptor in question, how it does so will matter. You are dealing with systems created by the blind-idiot god.
You are defending a way of doing biology that plagued by various problems. It’s a field where people literally believe that they can perceive more when they blind themselves.
There are huge issues in the theoretically underpinning of that approach because the people in the system are too busy writing research that doesn’t replicate for top tier journals that requires expensive equipment instead of thinking more about how to approach the field.
So every field has problems, but that doesn’t mean those problems are “huge”.
Outside view: An entire field which is generally pretty successful at actually finding what is going on is fundamentally misguided about how they should be approaching the field, or the biologists are doing what they can. Biology is hard. But we are making progress in biology at a rapid rate. For example, the use of genetic markers to figure out how to treat different cancers was first proposed in the early 1990s and is now a highly successful clinical method.
Really? Can you point to a paper demonstrating it’s better than classifying cancers the way histologists did in the 80s? Everything I’ve seen says that it just reconstructs the same classification. But it took ten years for the geneticists to admit that. I’ve seen more recent genetic classification that might be better than the old ones, but they didn’t bother to compare to the old genetic classifications, let alone the histology.
HER2 receptor. These days those with breast cancer that overexpresses this growth factor receptor tend to get monoclonal antibodies against it, which both suppress its growth effects and tag it for disruption by the immune system.
Yes, this is a protein test rather than a genetic test. But it lets the subset of people with this amplification get a treatment that has a large positive absolute effect on those with early-stage cancer.
I don’t know enough about that subfield to answer that question. If what you are saying is accurate, that’s highly disturbing. Most of my exposure to that subfield has been to popular press articles such as this one which paint a picture that sounds much more positive, but may well be highly distorted from what’s actually going on.
You might be but I’m not really.
That’s a crude method of measuring success.
The cost of new drugs rises exponentially via Eroom’s law. Big Pharma constantly lays of people.
A problem like obesity grows worse over the years instead of progress. Diabetes gets worse.
Even if you say that science isn’t about solving real world issues but about knowledge, I also think that replication rates of 11% in the case of breakthrough cancer research indicates that the field is not good at finding out what’s going on.
I don’t think a flat replication rate of 11% tells us anything without recourse to additional considerations. It’s sort of like a Umeshism: if your experiments are not routinely failing, you aren’t really experimenting. The best we can say is that 0% and 100% are both suboptimal...
For example, if I was told that anti-aging research was having a 11% replication rate for its ‘stopping aging’ treatments, I would regard this as shockingly too high and a collective crime on par with the Nazis, and if anyone asked me, would tell them that we need to spend far far more on anti-aging research because we clearly are not trying nearly enough crazy ideas. And if someone told me the clinical trials for curing balding were replicating at 89%, I would be a little uneasy and wonder what side-effects we were exposing all these people to.
(Heck, you can’t even tell much about the quality of the research from just a flat replication rate. If the prior odds are 1 in 10,000, then 11% looks pretty damn good. If the prior odds are 1 in 5, pretty damn bad.)
What I would accept as a useful invocation of an 11% rate is, say, an economic analysis of the benefits showing that this represents over-investment (for example, falling pharmacorp share prices) or surprise by planners/scientists/CEOs/bureaucrats where they had held more optimistic assumptions (and so investment is likely being wasted). That sort of thing.
Replication rate of experiments is quite different from the success rate of experiments.
An 11% success rate is often shockingly high. An 11% replication rate means the researchers are sloppy, value publishing over confidence in the results, and likely do way too much of throwing spaghetti at the wall...
Even granting your distinction, the exact same argument still applies: just substitute in an additional rate of, say, 10% chance of going from replication to whatever you choose to define as ‘success’. You cannot say that a 11% replication rate and then a 1.1% success rate is optimal—or suboptimal—without doing more intellectual work!
No, I don’t think so. An 11% replication rate means that 89% of the published results are junk and external observers have no problems seeing that. Which implies that if those who published it were a bit more honest/critical/responsible, they should have been able to do a better job of controlling for the effects which lead them to think there’s statistical significance when in fact there’s none.
If the prior odds are 1:10,000 you have no business publishing results at 0.05 confidence level.
Yes, so? As Edison said, I have discovered 999 ways to not build a lightbulb.
Huh? No. As I already said, you cannot go from replication rate to judgment of the honesty, competency, or insight of researchers without additional information. Most obviously, it’s going to be massively influenced by the prior odds of the hypotheses.
No one has any business publishing at an arbitrary confidence level, which should be chosen with respect to some even half-assed decision analysis. 1:10,000 or 1:1000, doesn’t matter.
You’re still ignoring the difference between a failed experiment and a failed replication.
Edison did not publish 999 papers each of them claiming that this is the way to build the lightbulb (at p=0.05).
And what exactly prevents the researchers from considering the prior odds when they are trying to figure out whether their results are really statistically significant?
I disagree with you—if a researcher consistently publishes research that cannot be replicated I will call him a bad researcher.
So? What does this have to do with my point about optimizing return from experimentation?
Nothing. But no one does that because to point out that a normal experiment has resulted in a posterior probability of <5% is not helpful since that could be said of all experiments, and to run a single experiment so high-powered that it could single-handedly overcome the prior probability is ludicrously wasteful. You don’t run a $50m clinical trial enrolling 50,000 people just because some drug looks interesting.
Too bad. You should get over that.
I think our disagreement comes (at least partially) from the different views on what does publishing research mean.
I see your position as looking on publishing as something like “We did A, B, and C. We got the results X and Y. Take it for what it is. The end.”
I’m looking on publishing more like this: “We did multiple experiments which did not give us the magical 0.05 number so we won’t tell you about them. But hey, try #39 succeeded and we can publish it: we did A39, B39, and C39 and got the results X39 and Y39. The results are significant so we believe them to be meaningful and reflective of actual reality. Please give our drug to your patients.”
The realities of scientific publishing are unfortunate (and yes, I know of efforts to ameliorate the problem in medical research). If people published all their research (“We did 50 runs with the following parameters, all failed, sure #39 showed statistical significance but we don’t believe it”) I would have zero problems with it. But that’s not how the world currently works.
P.S. By the way, here is some research which failed replication (via this)
That would be a better world. But in this world, it would still be true that there is no universal, absolute, optimal percentage of experiments failing to replicate, and the optimal percentage is set by decision-theoretic/economic concerns.
Experiments that fail to replicate at percentages greater than those expected from published confidence values (say, posterior probabilities) are evidence that the published confidence values are wrong.
A research process that consistently produces wrong confidence values has serious problems.
How would you know? People do not produce posterior probabilities or credible intervals, they produce confidence intervals and p-values.
I don’t see how this point helps you.
Either the p-values in the papers are worthless in the sense of not reflecting the probability that the observed effect is real—in which case the issue in the parent post stands.
Or the p-values, while not perfect, do reflect the probability the effect is real—in which case they are falsified by the replication rates and in which case the issue in the parent post stands.
p-values do not reflect the probability that the observed effect is real but the inverse, and no one has ever claimed that, so we can safely dismiss this entire line of thought.
p-values can, with some assumptions and choices, be used to calculate other things like positive predictive value/PPV, which are more meaningful. However, the issue still stands. Suppose a field’s studies have a PPV of 20%. Is this good or bad? I don’t know—it depends on the uses you intend to put it to and the loss function on the results.
Maybe it would be helpful if I put it in Bayesian terms where the terms are more meaningful & easier to understand. Suppose an experiment turns in a posterior with 80% of the distribution >0. Subsequent experiments or additional data collection will agree with and ‘replicate’ this result the obvious amount.
Now, was this experiment ‘underpowered’ (it collected too little data and is bad) or ‘overpowered’ (too much and inefficient/unethical) or just right? Was this field too tolerant of shoddy research practices in producing that result?
Well, if the associated loss function has a high penalty on true values being <0 (because the cancer drugs have nasty side-effects and are expensive and only somewhat improve on the other drugs) then it was probably underpowered; if it has a small loss function (because it was a website A/B test and you lose little if it was a worse variant) then it was probably overpowered because you spent more traffic/samples than you had to to choose a variant.
The ‘replication crises’ are a ‘crisis’ in part because people are basing meaningful decisions on the results to an extent that cannot be justified if one were to explicitly go through a Bayesian & decision theory analysis with informative data. eg pharmacorps probably should not be spending millions of dollars to buy and do preliminary trials on research which is not much distinguishable from noise, as they have learned to their intense frustration & financial cost, to say nothing of diet research. If the results did not matter to anyone, then it would not be a big deal if the PPV were 5% rather than 50%: the researchers would cope, and other people would not make costly suboptimal decisions.
There is no single replication rate which is ideal for cancer trials and GWASes and individual differences psychology research and taxonomy and ecology and schizophrenia trials and...
It isn’t a metric of success. It is an example, one of many in the biological sciences.
This is likely due largely to policy issues and legal issues more than it is how the biologists are thinking. Clinical trials have gotten large.
A systemic problem, but one that has even less to do with biological research than Eroom’s law. Obesity is not due to a lack of theoretical underpinnings in biology.
The question isn’t is the field very good. The question is are the problems which we both agree exist due at all to not enough theory? File drawer effects, cognitive biases, bad experimental design are all issues here, none of which fall into that category.
Then at what grounds do you claim that the field is succesful? How would you know if it weren’t succesful?
I’m not saying that theory lacks theoretical underpinnings but that the underpinning is of bad quality.
Question about designing experiments in a way that they produce reproduceable results instead of only large p values are theoretical issues.
Enough theory sounds like as attempt to quantify the amount of theory. That’s not what I advocate. Theories don’t get better through increase in their quantity. Good theoretical thinking can simply model and result in less complex theory.
That’s a good question, but in this context, seeing a variety of novel discoveries in the last few years indicates a somewhat successful field. By the same token, I’m curious what makes you think this isn’t a successful field?
I’ve already mentioned the file drawer problem. I’m curious, do you think that is a theoretical problem? If so, this may come down in part due to a very different notion of what theory means.
You seem to be treating biology to some extent like it is physics, But these are complex systems. What makes you think that such approaches will be at all successful?
The fact that Big Pharma has to lay of a lot of scientists is a real world indication that the output of model of finding a drug target, screening thousands of components against it, runs those components through clinical trials to find whether they are any good and then coming out with drugs that cure important illnesses at the other end stops producing results. Eroom’s law.
Saying that there’s a file drawer problem is quite easy. That however not a solution. I think your problem is that you can’t imaging a theory that would solve the problem. That’s typical. If it would be easy to imagine a theoretical breakthrough beforehand it wouldn’t be much of a breakthrough.
Look at a theoretical breakthrough of moving from the model of numbers as IV+II=VI to 4+2=6. If you would have talked with a Pythagoras he probably couldn’t imaging a theoretical breakthrough like that.
I don’t. I don’t know much about physics. Paleo/Quantified Self people found the thing with Vitamin D in the morning through phenemology. The community is relatively small and the amount of work that’s invested into the theoretical underpinning is small.
I think in my exposure with the field of biology from various angles that there are a lot of areas where things aren’t clear and there room for improvement on the level on epistomolgy and ontology.
I just recently preordered two angel sensors from crowdsourcing website indiegogo. I think that the money that the company gets will do much more to advance medicine than the average NHI grant.
This seems like extremely weak evidence. Diminishing marginal returns is a common thing in many areas. For example, engineering better trains happened a lot in the second half 19th century and the early 20th century. That slowed down, not because of some lack of theoretical background, but because the technology reached maturity. Now, improvements in train technology do occur, but slowly.
On the contrary. We have ways of handling the file drawer problem, and they aren’t theory based issues. Pre-registration of studies works. It isn’t even clear to me what it would mean to have a theoretical solution of the file drawer problem given that it is a problem about how culture, and a type of problem exists in any field. It makes about as much sense to talk about how having better theory could somehow solve type I errors.
The ancient Greeks used the Babylonian number system and the Greek system. They did not use Roman numerals.
The file drawer problem is about an effect. If you can estimate exactly how large the effect is when you look at the question of whether to take a certain drug you solve the problem because you can just run the numbers.
The concept of the file drawer problem first appeared in 1976 if I can trust google ngrams.
How much money do you think it cost to run the experiments to come up with the concept of the file drawer problem and the concept pre-registration of studies? I don’t think that’s knowledge that got created by running expensive experiments. It came from people engaging in theoretical thinking.
Type I errors are a feature of frequentist statistics. If you don’t use null hypotheses you don’t make type I errors. Bayesians don’t make type I errors because they don’t have null hypotheses.
The earliest citation in the Rosenthal paper that coined the term ‘file drawer’ is to a 1959 paper by one Theodore Sterling; I jailbroke this to “Publication Decisions and Their Possible Effects on Inferences Drawn from tests of Significance—or Vice Versa”.
After some background about NHST on page 1, Sterling immediately begins tallying tests of significance in a years’ worth of 4 psychology journals, on page 2, and discovers that eg of 106 tests, 105 rejected the null hypothesis. On page 3, he discusses how this bias could come about.
So at least in this very early discussion of publication bias, it was driven by people engaged in empirical thinking.
I think doing a literature review is engaging in using other people data. For the sake of this discussion JoshuaZ claimed that Einstein was doing theoretical work when he worked with other people’s data.
If I want to draw information from a literature review to gather insights I don’t need expensive equipment. JoshuaZ claimed that you need expensive equipement to gather new insights in biology. I claim that’s not true. I claim that there enough published information that’s not well organised into theories that you can make major advances in biology without needing to buy any equipment.
As far as I understand you don’t run experiments on participants to see whether Dual ‘n’ back works. You simply gather Dual ‘n’ back data from other people and tried doing it yourself to know how it feel like. That’s not expensive. You don’t need to write large grants to get a lot of money to do that kind of work.
You do need some money to pay your bills. Einstein made that money through being a patent clerk. I don’t know how you make your money to live. Of course you don’t have to tell and I respect if that’s private information.
For all I know you could be making money by being a patent clerk like Einstein.
A scientists who can’t work on his grant projects because he of the government shutdown could use his free time to do the kind of work that you are doing.
If you don’t like the label “theoretic” that’s fine. If you want to propose a different label that distinguish your approach from the making fancy expensive experiments approach I’m open to use another label.
I think in the last decades we had an explosion in the amount of data in biology. I think that organising that data into theories lags behind. I think it takes less effort to advance biology by organising into theories and to do a bit of phenomenology than to push for further for expensive equipment produced knowledge.
If I phrase it that way, would you agree?
This can be true but also suboptimal. I’m sure that given enough cleverness and effort, we could extract a lot of genetic causes out of existing SNP databases—but why bother when we can wait a decade and sequence everyone for $100 a head? People aren’t free, and equipment both complements and substitutes for them.
I assume you’re referring to my DNB meta-analysis? Yes, it’s not gathering primary data—I did think about doing that early on, which is why I carefully compiled all anecdotes mentioning IQ tests in my FAQ, but I realized that between the sheer heterogeneity, lack of a control group, massive selection effects, etc, the data was completely worthless.
But I can only gather the studies into a meta-analysis because people are running these studies. And I need a lot of data to draw any kind of conclusion. If n-back studies had stopped in 2010, I’d be out of luck, because with the studies up to 2010, I can exclude zero as the net effect, but I can’t make a rigorous statement about the effect of passive vs active control groups. (In fact, it’s only with the last 3 or 4 studies that the confidence intervals for the two groups stopped overlapping.) And these studies are expensive. I’m corresponding with one study author to correct the payment covariate, and it seems that on average participants were paid $600 - and there were 40, so they blew $24,000 just on paying the subjects, never mind paying for the MRI machine, the grad students, the professor time, publication, etc. At this point, the total cost of the research must be well into the millions of dollars.
It’s true that it’s a little irritating that no one has published a meta-analysis on DNB and that it’s not that difficult for a random person like myself to do it, it requires little in the way of resources—but that doesn’t change the fact that I still needed these dozens of professionals to run all these very expensive experiments to provide grist for the mill.
To go way up to Einstein, he was drawing on a lot of expensive data like that which showed the Mercury anomaly, and then was verified by very expensive data (I shudder to think how much those expeditions must have cost in constant dollars). Without that data, he would just be another… string theorist. Not Einstein.
Not by being a patent clerk, no. :)
To a very limited extent. There has to be enough studies to productively review, and there has to be no existing reviews you’re duplicating. To give another example: suppose I had been furloughed and wanted to work on a creatine meta-analysis. I get as far as I got now—not that hard, maybe 10 hours of work—and I realize there’s only 3 studies. Now what? Well, what I am doing is waiting a few months for 2 scientists to reply, and then I’ll wait another 5 or 10 years for governments to fund more psychology studies which happen to use creatine. But in no way can I possibly “finish” this even given months of government-shutdown-time.
I don’t think that’s a stupid or obviously incorrect claim, but I don’t think it’s right. Equipment is advancing fast (if not always as fast as my first example of genotyping/sequencing), so it’d be surprising to me if you could do more work by ignoring potential new data and reprocessing old work, and more generally, even though stuff like meta-analysis is accessible to anyone for free (case in point: myself), we don’t see anyone producing any impressive discoveries. Case in point: more than a few researchers already believed n-back might be an artifact of the control groups before I started my meta-analysis—my results are a welcome confirmation, not a novel discovery; or to use your vitamin D example, yes, it’s cool that we found an effect of vitamin D on sleep (I certainly believe it), but the counterfactual of “QS does not exist” is not “vitamin D’s effect on sleep goes unknown” but “Gominak discovers the effect on her patients and publishes a review paper in 2012 arguing that vitamin D affects sleep”.
LOL. That’s, um, not exactly true.
Let’s take a new drug trial. You want to find out whether the drug has certain (specific, detectable) effects. Could you please explain how a Bayesian approach to the results of the trial would make it impossible to make a Type I error, that is, a false positive: decide that the drug does have effects while in fact it does not?
I don’t. A real bayesian doesn’t. The bayesian wants to know the probability which with the drug will improve the well being of a patient.
The output of a bayesian analysis isn’t a truth value but a probability.
So is the output of a frequentist analysis.
However real life is full of step functions which translate probabilities into binary decisions. The FDA needs to either approve the drug or not approve the drug.
Saying “I will never make a Type I error because I will never make a hard decision” doesn’t look good as evidence for the superiority of Bayes...
Decisions are not the result of statistical test but of utility functions. A bayesian takes the probability that he gets from his statistics and puts that into his utility function.
Type I errors are a feature of statistical tests and not one of decision functions.
It’s a huge theoretical advance to move from aristotelism to baysianism. Maybe reading http://slatestarcodex.com/2013/08/06/on-first-looking-into-chapmans-pop-bayesianism/ might help you.
I doubt it. I already did and clearly it didn’t help :-P
There a difference between asking yourself: “Does this drug work better than other drugs?” and then deciding based on the answer to that question whether or not to approve the drug and asking “What’s the probability that the drug works?” and making a decision based on it.
In practice the FDA does ask their statistical tools “Does this drug work better than other drugs?” and then decides on that basis whether to approve the drug.
Why is that a problem? Take an issue like developing new antibiotica. Antibiotica are an area where there a consensus that not enough money goes into developing new ones. The special needs comes out of the fact that bacteria can develop resistance to drugs.
A bayesian FDA could just change the utility factor that goes to calculate the value of approving a new antibiotica medicament. Skipping the whole “Does this drug work?”- question and instead of focusing on the question “What’s the expected utility from approving the drug?”
The bayesian FDA could get a probability value that the drug works from the trial and another number to quantify the seriousness of sideeffects. Those numbers can go together into a utility function for making a decision.
Developing a good framework which the FDA could use to make such decisions would be theoretical work. The kind of work in which not enough intellectual effort goes because scientists rather want to play with fancy equipment.
If the FDA would publish utility values for the drugs that it approves that would also help insurance companies. A insurance company could sell you an insurance that pays for drugs that exceed a certain utility value for a cerain price.
You could simply factor the file drawer effect into such a model. If a company preregisters a trial and doesn’t publish it the utility score of the drug goes down. Preregistered trials count more towards the utility of the drug than trials with aren’t preregistered so you create an incentive for registration. You can do all sorts of thinks when you think about designing an utility function that goes beyond (“Does this drug work better than existing ones”(Yes/No”) and “Is it safe?”(Yes/No)).
You can even ask whether the FDA should do approval at all. You can just allow all drugs but say that insurance only pays for drugs with a certain demonstrated utility score. Just pay the Big Pharma more for drugs that have high demonstrated utility.
There you have a model of an FDA that wouldn’t do any Type I errors. I solved the basis of a theoretical problem that JoshuaZ considered insolveable in an afternoon.
*I would add that if you want to end the war on drugs, this propsal matters a lot. (Details left as exercise for the reader)
Consider Alice and Bob. Alice is a mainstream statistician, aka a frequentist. Bob is a Bayesian.
We take our clinical trial results and give them to both Alice and Bob.
Alice says: the p-value for the drug effectiveness is X. This means that there is X% probability that the results we see arose entirely by chance while the drug has no effect at all.
Bob says: my posterior probability for drug being useless is Y. This means Bob believes that there is (1-Y)% probability that drug is effective and Y% probability that is has no effect.
Given that both are competent and Bob doesn’t have strong priors X should be about the same as Y.
Do note that both Alice and Bob provided a probability as the outcome.
Now after that statistical analysis someone, let’s call him Trent, needs to make a binary decision. Trent says “I have a threshold of certainty/confidence Z. If the probability of the drug working is greater than Z, I will make a positive decision. If it’s lower, I will make a negative decision”.
Alice comes forward and says: here is my probability of the drug working, it is (1-X).
Bob comes forward and says: here is my probability of the drug working, it is (1-Y).
So, you’re saying that if Trent relies on Alice’s number (which was produced in the frequentist way) he is in danger of committing a Type I error. But if Trent relies on Bob’s number (which was produced in the Bayesian way) he cannot possibly commit a Type I error. Yes?
And then you start to fight the hypothetical and say that Trent really should not make a binary decision. He should just publish the probability and let everyone make their own decisions. Maybe—that works in some cases and doesn’t work in others. But Trent can publish Alice’s number, and he can publish Bob’s number—they are pretty much the same and both can be adequate inputs into some utility function. So where exactly is the Bayesian advantage?
Why? X is P(results >= what we saw | effect = 0), whereas Y is P(effect < costs | results = what we saw). I can see no obvious reason those would be similar, not even if we assume costs = 0; p(results = what we saw | effect = 0) = p(effect = 0 | results = what we saw) iff p_{prior}(result = what we saw) = p_{prior}(effect = 0) (where the small p’s are probability densities, not probability masses), but that’s another story.
You have two samples: one was given the drug, the other was given the placebo. You have some metric for the effect you’re looking for, a value of interest.
The given-drug sample has a certain distribution of the values of your metric which you model as a random variable. The given-placebo sample also has a distribution of these values (different, of course) which you also model as a random variable.
The statistical questions are whether these two random variables are different, in which way, and how confident you are of the answers.
For simple questions like that (and absent strong priors) the frequentists and the Bayesians will come to very similar conclusions and very similar probabilities.
Yes, but the p-value and the posterior probability aren’t even the same question, are they?
No, they are not.
However for many simple cases—e.g. where we are considering only two possible hypotheses—they are sufficiently similar.
No. You don’t understand null hypotheis testing. It doesn’t measure whether the results arose entirely by chance. It measures whether a specifc null hypothsis can be rejected.
I hate to disappoint you, but I do understand null hypothesis testing. In this particular example the specific null hypothesis is that the drug has no effect and therefore all observable results arose entirely by chance.
Almost no drug has no effect. Most drug changes the patient and produces either a slight advantage or disadvantage.
If what you saying is correct I could simply run n=1 experiments.
You are really determined to fight they hypothetical, aren’t you? :-) Let me quote myself with the relevant part emphasized: “You want to find out whether the drug has certain (specific, detectable) effects.”
And how would they help you? There is the little issue of noise. You cannot detect any effects below the noise floor and for n=1 that floor is going to be pretty high.
A p-value isn’t the probability that a drug has certain (specific, detectable) effects. 1-p isn’t either.
No, I’m accepting it. The probability of a drug having zero effects is 0. If your statistics give you an answer that a drug has a probability other than 0 for a drug having zero effects your statistics are wrong.
I think your answer suggests the idea that an experiment might provide actionable information.
But you still claim that every experiment provides an actionable probability when interpreted by a frequentist.
If you give a bayesian your priors and then get a posterior probability from the bayesian that probability is in every case actionable.
Again: the probability that a drug has no specific, detectable effects is NOT zero.
Huh? What? I don’t even… Please quote me.
What do you call an “actionable” probability? What would be an example of a “non-actionable” probability?
I don’t care about detectability when I take a drug. I care about whether it helps me. I want a number that tell me the probability of the drug helping me. I don’t want the statisician to answer a different question.
Detectability depends on the power of a trial.
If a frequentist gives you some number after he analysed an experiment you can’t just fit that number in a decision function. You have to think about issues such as whether the experiment had enough power to pick up an effect.
If a bayesian gives you a probability you don’t have to think about such issues because the bayesian already integrates your prior knowledge. The probability that the bayesian gives you can be directly used.
Drug trials are neither designed to, nor capable of answering questions like this.
Whether a drug will help you is a different probability that comes out of a complicated evaluation for which the drug trial results serve as just one of the inputs.
I am sorry, you’re speaking nonsense.
That evaluation is in it’s nature bayesian. Bayes rule is about adding together different probabilities.
At the moment there no systematic way of going about it. That’s where theory development is needed. I would that someone like the FDA writes down all their priors and then provides some computer analysis tool that actually calculates that probability.
If the priors are correct then a correct bayesian analysis provides me exactly the probability in which I should believe after I read the study.
v
No, seeing a bunch of novel true discoveries indicates a successful field. However, it’s normally hard to independently verify the truth of novel discoveries except in cases where those discoveries have applications.
This seems like a nitpick more than a serious remark: obviously one is talking about the true discoveries, and giving major examples of them in biology is not at all difficult. The discovery of RNA interference is in the biochem end of things, while a great number of discoveries have occurred in paleontology as well as using genetics to trace population migrations (both humans and non-humans).
So one question here is, for what types of discoveries is your prior high that the discovery is bogus? And how will you tell? General skepticism probably makes sense for a lot of medical “breakthroughs” but there’s a lot of biology other than those.