If the treatment is relatively mild, the dropouts are comparable between groups then I am not sure that per protocol will introduce much bias. What do you think? In that case it can be a decent tool for enhancing power, although the results will always be considered “post hoc” and “hypothesis-generating”.
From experience I would say that intention-to-treat analysis is the standard in large studies of drugs and supplements, while per protocol is often performed as a secondary analysis. Especially when ITT is marginal and you have to go fishing for some results to report and to justify follow-up research.
The bias introduced is probably usually small, especially when the dropout rate is low. But, in those cases you get very little “enhanced power”. You would be better off just not bothering with a per-protocol analysis, as you would get the same result from an ordinary analysis based on which group the person was sorted into originally (control or not).
The only situation in which the per-protocol analysis is worth doing is one where it makes a real difference to the statistics, and that is exactly the same situation in which it introduces the risk of introducing bias. So, I think it might just never be worth it: it removes a known problem (due to dropouts, some people in the yoga group didn’t do all the yoga), with an unknown problem (the yoga group is post-selected nonrandomly), effecting exactly the same number of participants—so the same scale of problem.
In the Yoga context then I would say that if it’s really good at curing depression then surely its effect size is going to be big enough to swamp a small number of yoga dropouts.
They also only have 32 participants in the trial. I don’t know if its a real rule, but I feel like the smaller the dataset the more you should stick to really basic simple measures.
It’s a good question. I have the intuition that just a little potential for bias can go a long way toward messing up the estimated effect, so allowing this practice is net negative despite the gains in power. The dropouts might be similar on demographics but not something unmeasured like motivation. My view comes from seeing many failed replications and playing with datasets when available, but I would love to be able to quantify this issue somehow...I would certainly predict that studies where the per protocol finding differs from the ITT will be far less likely to replicate.
If the treatment is relatively mild, the dropouts are comparable between groups then I am not sure that per protocol will introduce much bias. What do you think? In that case it can be a decent tool for enhancing power, although the results will always be considered “post hoc” and “hypothesis-generating”.
From experience I would say that intention-to-treat analysis is the standard in large studies of drugs and supplements, while per protocol is often performed as a secondary analysis. Especially when ITT is marginal and you have to go fishing for some results to report and to justify follow-up research.
The bias introduced is probably usually small, especially when the dropout rate is low. But, in those cases you get very little “enhanced power”. You would be better off just not bothering with a per-protocol analysis, as you would get the same result from an ordinary analysis based on which group the person was sorted into originally (control or not).
The only situation in which the per-protocol analysis is worth doing is one where it makes a real difference to the statistics, and that is exactly the same situation in which it introduces the risk of introducing bias. So, I think it might just never be worth it: it removes a known problem (due to dropouts, some people in the yoga group didn’t do all the yoga), with an unknown problem (the yoga group is post-selected nonrandomly), effecting exactly the same number of participants—so the same scale of problem.
In the Yoga context then I would say that if it’s really good at curing depression then surely its effect size is going to be big enough to swamp a small number of yoga dropouts.
They also only have 32 participants in the trial. I don’t know if its a real rule, but I feel like the smaller the dataset the more you should stick to really basic simple measures.
It’s a good question. I have the intuition that just a little potential for bias can go a long way toward messing up the estimated effect, so allowing this practice is net negative despite the gains in power. The dropouts might be similar on demographics but not something unmeasured like motivation. My view comes from seeing many failed replications and playing with datasets when available, but I would love to be able to quantify this issue somehow...I would certainly predict that studies where the per protocol finding differs from the ITT will be far less likely to replicate.