There a difference between asking yourself: “Does this drug work better than other drugs?” and then deciding based on the answer to that question whether or not to approve the drug and asking “What’s the probability that the drug works?” and making a decision based on it.
In practice the FDA does ask their statistical tools “Does this drug work better than other drugs?” and then decides on that basis whether to approve the drug.
Why is that a problem? Take an issue like developing new antibiotica. Antibiotica are an area where there a consensus that not enough money goes into developing new ones. The special needs comes out of the fact that bacteria can develop resistance to drugs.
A bayesian FDA could just change the utility factor that goes to calculate the value of approving a new antibiotica medicament.
Skipping the whole “Does this drug work?”- question and instead of focusing on the question “What’s the expected utility from approving the drug?”
The bayesian FDA could get a probability value that the drug works from the trial and another number to quantify the seriousness of sideeffects. Those numbers can go together into a utility function for making a decision.
Developing a good framework which the FDA could use to make such decisions would be theoretical work.
The kind of work in which not enough intellectual effort goes because scientists rather want to play with fancy equipment.
If the FDA would publish utility values for the drugs that it approves that would also help insurance companies.
A insurance company could sell you an insurance that pays for drugs that exceed a certain utility value for a cerain price.
You could simply factor the file drawer effect into such a model. If a company preregisters a trial and doesn’t publish it the utility score of the drug goes down.
Preregistered trials count more towards the utility of the drug than trials with aren’t preregistered so you create an incentive for registration.
You can do all sorts of thinks when you think about designing an utility function that goes beyond (“Does this drug work better than existing ones”(Yes/No”) and “Is it safe?”(Yes/No)).
You can even ask whether the FDA should do approval at all. You can just allow all drugs but say that insurance only pays for drugs with a certain demonstrated utility score. Just pay the Big Pharma more for drugs that have high demonstrated utility.
There you have a model of an FDA that wouldn’t do any Type I errors.
I solved the basis of a theoretical problem that JoshuaZ considered insolveable in an afternoon.
*I would add that if you want to end the war on drugs, this propsal matters a lot. (Details left as exercise for the reader)
Consider Alice and Bob. Alice is a mainstream statistician, aka a frequentist. Bob is a Bayesian.
We take our clinical trial results and give them to both Alice and Bob.
Alice says: the p-value for the drug effectiveness is X. This means that there is X% probability that the results we see arose entirely by chance while the drug has no effect at all.
Bob says: my posterior probability for drug being useless is Y. This means Bob believes that there is (1-Y)% probability that drug is effective and Y% probability that is has no effect.
Given that both are competent and Bob doesn’t have strong priors X should be about the same as Y.
Do note that both Alice and Bob provided a probability as the outcome.
Now after that statistical analysis someone, let’s call him Trent, needs to make a binary decision. Trent says “I have a threshold of certainty/confidence Z. If the probability of the drug working is greater than Z, I will make a positive decision. If it’s lower, I will make a negative decision”.
Alice comes forward and says: here is my probability of the drug working, it is (1-X).
Bob comes forward and says: here is my probability of the drug working, it is (1-Y).
So, you’re saying that if Trent relies on Alice’s number (which was produced in the frequentist way) he is in danger of committing a Type I error. But if Trent relies on Bob’s number (which was produced in the Bayesian way) he cannot possibly commit a Type I error. Yes?
And then you start to fight the hypothetical and say that Trent really should not make a binary decision. He should just publish the probability and let everyone make their own decisions. Maybe—that works in some cases and doesn’t work in others. But Trent can publish Alice’s number, and he can publish Bob’s number—they are pretty much the same and both can be adequate inputs into some utility function. So where exactly is the Bayesian advantage?
Given that both are competent and Bob doesn’t have strong priors X should be about the same as Y.
Why? X is P(results >= what we saw | effect = 0), whereas Y is P(effect < costs | results = what we saw). I can see no obvious reason those would be similar, not even if we assume costs = 0; p(results = what we saw | effect = 0) = p(effect = 0 | results = what we saw) iff p_{prior}(result = what we saw) = p_{prior}(effect = 0) (where the small p’s are probability densities, not probability masses), but that’s another story.
You have two samples: one was given the drug, the other was given the placebo. You have some metric for the effect you’re looking for, a value of interest.
The given-drug sample has a certain distribution of the values of your metric which you model as a random variable. The given-placebo sample also has a distribution of these values (different, of course) which you also model as a random variable.
The statistical questions are whether these two random variables are different, in which way, and how confident you are of the answers.
For simple questions like that (and absent strong priors) the frequentists and the Bayesians will come to very similar conclusions and very similar probabilities.
For simple questions like that (and absent strong priors) the frequentists and the Bayesians will come to very similar conclusions and very similar probabilities.
Yes, but the p-value and the posterior probability aren’t even the same question, are they?
Alice says: the p-value for the drug effectiveness is X. This means that there is X% probability that the results we see arose entirely by chance while the drug has no effect at all.
No. You don’t understand null hypotheis testing. It doesn’t measure whether the results arose entirely by chance. It measures whether a specifc null hypothsis can be rejected.
I hate to disappoint you, but I do understand null hypothesis testing. In this particular example the specific null hypothesis is that the drug has no effect and therefore all observable results arose entirely by chance.
You are really determined to fight they hypothetical, aren’t you? :-) Let me quote myself with the relevant part emphasized: “You want to find out whether the drug has certain (specific, detectable) effects.”
I could simply run n=1 experiments
And how would they help you? There is the little issue of noise. You cannot detect any effects below the noise floor and for n=1 that floor is going to be pretty high.
“You want to find out whether the drug has certain (specific, detectable) effects.”
A p-value isn’t the probability that a drug has certain (specific, detectable) effects. 1-p isn’t either.
You are really determined to fight they hypothetical, aren’t you?
No, I’m accepting it. The probability of a drug having zero effects is 0. If your statistics give you an answer that a drug has a probability other than 0 for a drug having zero effects your statistics are wrong.
I think your answer suggests the idea that an experiment might provide actionable information.
And how would they help you? There is the little issue of noise. You cannot detect any effects below the noise floor and for n=1 that floor is going to be pretty high.
But you still claim that every experiment provides an actionable probability when interpreted by a frequentist.
If you give a bayesian your priors and then get a posterior probability from the bayesian that probability is in every case actionable.
Again: the probability that a drug has no specific, detectable effects is NOT zero.
I don’t care about detectability when I take a drug. I care about whether it helps me.
I want a number that tell me the probability of the drug helping me. I don’t want the statisician to answer a different question.
Detectability depends on the power of a trial.
If a frequentist gives you some number after he analysed an experiment you can’t just fit that number in a decision function.
You have to think about issues such as whether the experiment had enough power to pick up an effect.
If a bayesian gives you a probability you don’t have to think about such issues because the bayesian already integrates your prior knowledge. The probability that the bayesian gives you can be directly used.
Drug trials are neither designed to, nor capable of answering questions like this.
Whether a drug will help you is a different probability that comes out of a complicated evaluation for which the drug trial results serve as just one of the inputs.
If a bayesian gives you a probability you don’t have to think about such issues
Whether a drug will help you is a different probability that comes out of a complicated evaluation for which the drug trial results serve as just one of the inputs.
That evaluation is in it’s nature bayesian. Bayes rule is about adding together different probabilities.
At the moment there no systematic way of going about it. That’s where theory development is needed. I would that someone like the FDA writes down all their priors and then provides some computer analysis tool that actually calculates that probability.
I am sorry, you’re speaking nonsense.
If the priors are correct then a correct bayesian analysis provides me exactly the probability in which I should believe after I read the study.
I doubt it. I already did and clearly it didn’t help :-P
There a difference between asking yourself: “Does this drug work better than other drugs?” and then deciding based on the answer to that question whether or not to approve the drug and asking “What’s the probability that the drug works?” and making a decision based on it.
In practice the FDA does ask their statistical tools “Does this drug work better than other drugs?” and then decides on that basis whether to approve the drug.
Why is that a problem? Take an issue like developing new antibiotica. Antibiotica are an area where there a consensus that not enough money goes into developing new ones. The special needs comes out of the fact that bacteria can develop resistance to drugs.
A bayesian FDA could just change the utility factor that goes to calculate the value of approving a new antibiotica medicament. Skipping the whole “Does this drug work?”- question and instead of focusing on the question “What’s the expected utility from approving the drug?”
The bayesian FDA could get a probability value that the drug works from the trial and another number to quantify the seriousness of sideeffects. Those numbers can go together into a utility function for making a decision.
Developing a good framework which the FDA could use to make such decisions would be theoretical work. The kind of work in which not enough intellectual effort goes because scientists rather want to play with fancy equipment.
If the FDA would publish utility values for the drugs that it approves that would also help insurance companies. A insurance company could sell you an insurance that pays for drugs that exceed a certain utility value for a cerain price.
You could simply factor the file drawer effect into such a model. If a company preregisters a trial and doesn’t publish it the utility score of the drug goes down. Preregistered trials count more towards the utility of the drug than trials with aren’t preregistered so you create an incentive for registration. You can do all sorts of thinks when you think about designing an utility function that goes beyond (“Does this drug work better than existing ones”(Yes/No”) and “Is it safe?”(Yes/No)).
You can even ask whether the FDA should do approval at all. You can just allow all drugs but say that insurance only pays for drugs with a certain demonstrated utility score. Just pay the Big Pharma more for drugs that have high demonstrated utility.
There you have a model of an FDA that wouldn’t do any Type I errors. I solved the basis of a theoretical problem that JoshuaZ considered insolveable in an afternoon.
*I would add that if you want to end the war on drugs, this propsal matters a lot. (Details left as exercise for the reader)
Consider Alice and Bob. Alice is a mainstream statistician, aka a frequentist. Bob is a Bayesian.
We take our clinical trial results and give them to both Alice and Bob.
Alice says: the p-value for the drug effectiveness is X. This means that there is X% probability that the results we see arose entirely by chance while the drug has no effect at all.
Bob says: my posterior probability for drug being useless is Y. This means Bob believes that there is (1-Y)% probability that drug is effective and Y% probability that is has no effect.
Given that both are competent and Bob doesn’t have strong priors X should be about the same as Y.
Do note that both Alice and Bob provided a probability as the outcome.
Now after that statistical analysis someone, let’s call him Trent, needs to make a binary decision. Trent says “I have a threshold of certainty/confidence Z. If the probability of the drug working is greater than Z, I will make a positive decision. If it’s lower, I will make a negative decision”.
Alice comes forward and says: here is my probability of the drug working, it is (1-X).
Bob comes forward and says: here is my probability of the drug working, it is (1-Y).
So, you’re saying that if Trent relies on Alice’s number (which was produced in the frequentist way) he is in danger of committing a Type I error. But if Trent relies on Bob’s number (which was produced in the Bayesian way) he cannot possibly commit a Type I error. Yes?
And then you start to fight the hypothetical and say that Trent really should not make a binary decision. He should just publish the probability and let everyone make their own decisions. Maybe—that works in some cases and doesn’t work in others. But Trent can publish Alice’s number, and he can publish Bob’s number—they are pretty much the same and both can be adequate inputs into some utility function. So where exactly is the Bayesian advantage?
Why? X is P(results >= what we saw | effect = 0), whereas Y is P(effect < costs | results = what we saw). I can see no obvious reason those would be similar, not even if we assume costs = 0; p(results = what we saw | effect = 0) = p(effect = 0 | results = what we saw) iff p_{prior}(result = what we saw) = p_{prior}(effect = 0) (where the small p’s are probability densities, not probability masses), but that’s another story.
You have two samples: one was given the drug, the other was given the placebo. You have some metric for the effect you’re looking for, a value of interest.
The given-drug sample has a certain distribution of the values of your metric which you model as a random variable. The given-placebo sample also has a distribution of these values (different, of course) which you also model as a random variable.
The statistical questions are whether these two random variables are different, in which way, and how confident you are of the answers.
For simple questions like that (and absent strong priors) the frequentists and the Bayesians will come to very similar conclusions and very similar probabilities.
Yes, but the p-value and the posterior probability aren’t even the same question, are they?
No, they are not.
However for many simple cases—e.g. where we are considering only two possible hypotheses—they are sufficiently similar.
No. You don’t understand null hypotheis testing. It doesn’t measure whether the results arose entirely by chance. It measures whether a specifc null hypothsis can be rejected.
I hate to disappoint you, but I do understand null hypothesis testing. In this particular example the specific null hypothesis is that the drug has no effect and therefore all observable results arose entirely by chance.
Almost no drug has no effect. Most drug changes the patient and produces either a slight advantage or disadvantage.
If what you saying is correct I could simply run n=1 experiments.
You are really determined to fight they hypothetical, aren’t you? :-) Let me quote myself with the relevant part emphasized: “You want to find out whether the drug has certain (specific, detectable) effects.”
And how would they help you? There is the little issue of noise. You cannot detect any effects below the noise floor and for n=1 that floor is going to be pretty high.
A p-value isn’t the probability that a drug has certain (specific, detectable) effects. 1-p isn’t either.
No, I’m accepting it. The probability of a drug having zero effects is 0. If your statistics give you an answer that a drug has a probability other than 0 for a drug having zero effects your statistics are wrong.
I think your answer suggests the idea that an experiment might provide actionable information.
But you still claim that every experiment provides an actionable probability when interpreted by a frequentist.
If you give a bayesian your priors and then get a posterior probability from the bayesian that probability is in every case actionable.
Again: the probability that a drug has no specific, detectable effects is NOT zero.
Huh? What? I don’t even… Please quote me.
What do you call an “actionable” probability? What would be an example of a “non-actionable” probability?
I don’t care about detectability when I take a drug. I care about whether it helps me. I want a number that tell me the probability of the drug helping me. I don’t want the statisician to answer a different question.
Detectability depends on the power of a trial.
If a frequentist gives you some number after he analysed an experiment you can’t just fit that number in a decision function. You have to think about issues such as whether the experiment had enough power to pick up an effect.
If a bayesian gives you a probability you don’t have to think about such issues because the bayesian already integrates your prior knowledge. The probability that the bayesian gives you can be directly used.
Drug trials are neither designed to, nor capable of answering questions like this.
Whether a drug will help you is a different probability that comes out of a complicated evaluation for which the drug trial results serve as just one of the inputs.
I am sorry, you’re speaking nonsense.
That evaluation is in it’s nature bayesian. Bayes rule is about adding together different probabilities.
At the moment there no systematic way of going about it. That’s where theory development is needed. I would that someone like the FDA writes down all their priors and then provides some computer analysis tool that actually calculates that probability.
If the priors are correct then a correct bayesian analysis provides me exactly the probability in which I should believe after I read the study.