In the case where you get “W”, you update and P(W|”W”)=99% and you continue taking melatonin. But in the case where you get “~W”, you update and P(W|”~W”)=17%. Given the massive RoI you calculated for melatonin, it sounds like it’s worth taking even if there’s only a 17% chance that it’s actually effective.
The Bayes calculation is (0.05 * 0.8) / ((0.05 * 0.8) + (0.95 * 0.2)) = 0.1739..., right? (A second experiment would knock it down to ~0.01, apparently.)
I didn’t notice that. I didn’t realize I was making an assumption that on a negative experimental result, I’d immediately stop buying whatever. Now I suddenly remember the Wikipedia article talking about iterating… After I get one experimental result, I need to redo the expected-value calculation, and re-run the VoI on further experiments; sigh I guess I’d better reword the melatonin section and add a footnote to the master version explaining this!
A brief terminology correction: the “value of perfect information” would be $41, not $205 (i.e. it includes the 20% estimate that melatonin doesn’t work). If you replace that with “value of a perfect negative result” you should be fine.
It’s also possible that P(W|”~W”) is way lower than .05, and so the test could be better than that calculation makes it look. This is something you can figure out from basic stats and your experimental design, and I strongly recommend actually running the numbers. Psychology for years has been plagued with studies that are too small to actually provide valuable information, as people in general aren’t good intuitive statisticians.
This is something you can figure out from basic stats and your experimental design, and I strongly recommend actually running the numbers.
As it happens, I learned how to do basic power calculations not that long ago. I didn’t do an explicit calculation for the melatonin trial because I didn’t randomize selection, instead doing an alternating days design and not always following that, so I thought why bother doing one in retrospect?
But if we were to wave that away, the power seems fine. I have something like 141 days of data, of which around 90-100 is usable, giving me maybe <50 pairs? If I fire up R and load in the two means and the standard deviation (which I had left over from calculating the effect size), and then play with the numbers, then to get an 85% chance I could find an effect at p=0.01:
> pwr.t.test(d=(456.4783 - 407.5312) / 131.4656,power=0.85,sig.level=0.01,type="paired",alternative="greater")
Paired t test power calculation
n = 84.3067
d = 0.3723187
sig.level = 0.01
power = 0.85
alternative = greater
NOTE: n is number of *pairs*
If I drop the p=0.01 for 0.05, it looks like I should have had a good shot at detecting the effect:
> pwr.t.test(d=(456.4783 - 407.5312) / 131.4656,power=0.85,sig.level=0.05,type="paired",alternative="greater")
Paired t test power calculation
n = 53.24355
So, it’s not great, but it’s at least not terribly wrong?
EDIT: Just realized that I equivocated over days vs pairs in my existing power analyses; 1 was wrong, but I apparently avoided the error in another, phew.
I’m wondering why 0.05 (alpha) was used in that formula? True positive and false negative rates depends on statistical power (1-beta) and beta, and in case of beta 0.2, rate of “Melatonin is working” in case of negative result is 0.457 (not a 0.1739)
“Melatonin is working” branch (prior P(W) = 0.8) have 2 possibilities True positive, P(“W”|W) = 1-b = 0.8 False negative, P(“~W”|W) = b = 0.2
“Melatonin is not working” branch (prior P(~W) = 0.2) have 2 possibilities False positive, P(“W”|~W) = a = 0.05 True negative, , P(“~W”|~W) = 1-a = 0.95
Thanks for the comments.
The Bayes calculation is
(0.05 * 0.8) / ((0.05 * 0.8) + (0.95 * 0.2)) = 0.1739...
, right? (A second experiment would knock it down to ~0.01, apparently.)I didn’t notice that. I didn’t realize I was making an assumption that on a negative experimental result, I’d immediately stop buying whatever. Now I suddenly remember the Wikipedia article talking about iterating… After I get one experimental result, I need to redo the expected-value calculation, and re-run the VoI on further experiments; sigh I guess I’d better reword the melatonin section and add a footnote to the master version explaining this!
I’ll reword that.
I’ll need to think about the Adderall point.
You’re welcome!
That’s how I did it.
It’s also possible that P(W|”~W”) is way lower than .05, and so the test could be better than that calculation makes it look. This is something you can figure out from basic stats and your experimental design, and I strongly recommend actually running the numbers. Psychology for years has been plagued with studies that are too small to actually provide valuable information, as people in general aren’t good intuitive statisticians.
As it happens, I learned how to do basic power calculations not that long ago. I didn’t do an explicit calculation for the melatonin trial because I didn’t randomize selection, instead doing an alternating days design and not always following that, so I thought why bother doing one in retrospect?
But if we were to wave that away, the power seems fine. I have something like 141 days of data, of which around 90-100 is usable, giving me maybe <50 pairs? If I fire up R and load in the two means and the standard deviation (which I had left over from calculating the effect size), and then play with the numbers, then to get an 85% chance I could find an effect at p=0.01:
If I drop the p=0.01 for 0.05, it looks like I should have had a good shot at detecting the effect:
So, it’s not great, but it’s at least not terribly wrong?
EDIT: Just realized that I equivocated over days vs pairs in my existing power analyses; 1 was wrong, but I apparently avoided the error in another, phew.
I’m wondering why 0.05 (alpha) was used in that formula? True positive and false negative rates depends on statistical power (1-beta) and beta, and in case of beta 0.2, rate of “Melatonin is working” in case of negative result is 0.457 (not a 0.1739)
“Melatonin is working” branch (prior P(W) = 0.8) have 2 possibilities
True positive, P(“W”|W) = 1-b = 0.8
False negative, P(“~W”|W) = b = 0.2
“Melatonin is not working” branch (prior P(~W) = 0.2) have 2 possibilities
False positive, P(“W”|~W) = a = 0.05
True negative, , P(“~W”|~W) = 1-a = 0.95
P(W|”~W”) = P(“~W”|W) * P(W) / (P(“~W”|W) * P(W) + P(“~W”|~W) * P(~W)) =
(0.2 * 0.8) / ((0.2 * 0.8) + (0.95 * 0.2)) = 0.457, not 0.1739 (~3 fold difference)
I’m a bit confused because i’m getting different results, but maybe i’m wrong and someone can correct me?
I’m planning to make blind experiment with melatonin, but want to learn more stats and better understand VOI, before i start