Maybe, but my understanding is that that value is already being screened off: the something must be positive expected value in the first place, or you wouldn’t be using it at all in the first place.
(But I could be wrong, and I’ve already pinged Vaniver with a request to look things over since that’s the sort of basic conceptual confusion I couldn’t get myself out of.)
First off, kudos for discussing non-VoI reasons to run these experiments. Real decisions have many factors.
The eyeballed estimate of how much the experimental design reduces the value from perfect information should be replaced by a decision tree. If the experiment can’t give you enough data to change your position, then it’s not material.
Using the first example, where W is melatonin works and “W” is the experiment saying that melatonin works, it looks like you provided P(W)=.8, P(“W”|W)=.95, and P(“W”|~W)=.05. I assumed that >90% corresponded to a point estimate of 95%, and that the test was symmetric, which should get thought about more if you’re doing this seriously.
In the case where you get “W”, you update and P(W|”W”)=99% and you continue taking melatonin. But in the case where you get “~W”, you update and P(W|”~W”)=17%. Given the massive RoI you calculated for melatonin, it sounds like it’s worth taking even if there’s only a 17% chance that it’s actually effective. Rather than continuing blindly on, you’d probably continue the test until you had enough data to be sure / pin down your RoI calculation, but you should be able to map that out now before you start the experiment.
There’s a question of prior information here- from what you’re written, it sounds like you should be more than 80% sure that melatonin worked for you. You might be interested in a different question- “melatonin still works for me”- which it might be reasonable to have an 80% prior on. If the uncertainty is about the value of taking melatonin, it seems like you could design a better experiment that narrows your uncertainty there (by looking for cognitive costs, or getting a better estimate of time saved, etc.).
A brief terminology correction: the “value of perfect information” would be $41, not $205 (i.e. it includes the 20% estimate that melatonin doesn’t work). If you replace that with “value of a perfect negative result” you should be fine.
In 3, you’re considering adding a new supplement, not stopping a supplement you already use. The “I don’t try Adderall” case has value $0, the “Adderall fails” case is worth -$40 (assuming you only bought 10 pills, and this number should be increased by your analysis time and a weighted cost for potential permanent side effects), and the “Adderall succeeds” case is worth $X-40-4099, where $X is the discounted lifetime value of the increased productivity due to Adderall, minus any discounted long-term side effect costs. If you estimate Adderall will work with p=.5, then you should try out Adderall if you estimate that .5(X-4179)>0 → X>4179. (Adderall working or not isn’t binary, and so you might be more comfortable breaking down the various “how effective Adderall is” cases when eliciting X, by coming up with different levels it could work at, their values, and then using a weighted sum to get X. This can also give you a better target with your experiment- “this needs to show a benefit of at least Y from Adderall for it to be worth the cost, and I’ve designed it so it has a reasonable chance of showing that.”)
One thing to notice is that the default case matters a lot. This asymmetry is because you switch decisions in different possible worlds- when you would take Adderall but stop you’re in the world where Adderall doesn’t work, and when you wouldn’t take Adderall but do you’re in the world where Adderall does work (in the perfect information case, at least). One of the ways you can visualize this is that you don’t penalize tests for giving you true negative information, and you reward them for giving you true positive information. (This might be worth a post by itself, and is very Litany of Gendlin.)
The rest is similar. I definitely agree with the last line: possibly a way to drive it home is to talk about dividing by ln(1.05), which is essentially multiplying by 20.5. If you can make a one-time investment that pays off annually until you die, that’s worth 20.5 times the annual return, and multiplying the value of something by 20 can often move it from not worth thinking about to worth thinking about.
In the case where you get “W”, you update and P(W|”W”)=99% and you continue taking melatonin. But in the case where you get “~W”, you update and P(W|”~W”)=17%. Given the massive RoI you calculated for melatonin, it sounds like it’s worth taking even if there’s only a 17% chance that it’s actually effective.
The Bayes calculation is (0.05 * 0.8) / ((0.05 * 0.8) + (0.95 * 0.2)) = 0.1739..., right? (A second experiment would knock it down to ~0.01, apparently.)
I didn’t notice that. I didn’t realize I was making an assumption that on a negative experimental result, I’d immediately stop buying whatever. Now I suddenly remember the Wikipedia article talking about iterating… After I get one experimental result, I need to redo the expected-value calculation, and re-run the VoI on further experiments; sigh I guess I’d better reword the melatonin section and add a footnote to the master version explaining this!
A brief terminology correction: the “value of perfect information” would be $41, not $205 (i.e. it includes the 20% estimate that melatonin doesn’t work). If you replace that with “value of a perfect negative result” you should be fine.
It’s also possible that P(W|”~W”) is way lower than .05, and so the test could be better than that calculation makes it look. This is something you can figure out from basic stats and your experimental design, and I strongly recommend actually running the numbers. Psychology for years has been plagued with studies that are too small to actually provide valuable information, as people in general aren’t good intuitive statisticians.
This is something you can figure out from basic stats and your experimental design, and I strongly recommend actually running the numbers.
As it happens, I learned how to do basic power calculations not that long ago. I didn’t do an explicit calculation for the melatonin trial because I didn’t randomize selection, instead doing an alternating days design and not always following that, so I thought why bother doing one in retrospect?
But if we were to wave that away, the power seems fine. I have something like 141 days of data, of which around 90-100 is usable, giving me maybe <50 pairs? If I fire up R and load in the two means and the standard deviation (which I had left over from calculating the effect size), and then play with the numbers, then to get an 85% chance I could find an effect at p=0.01:
> pwr.t.test(d=(456.4783 - 407.5312) / 131.4656,power=0.85,sig.level=0.01,type="paired",alternative="greater")
Paired t test power calculation
n = 84.3067
d = 0.3723187
sig.level = 0.01
power = 0.85
alternative = greater
NOTE: n is number of *pairs*
If I drop the p=0.01 for 0.05, it looks like I should have had a good shot at detecting the effect:
> pwr.t.test(d=(456.4783 - 407.5312) / 131.4656,power=0.85,sig.level=0.05,type="paired",alternative="greater")
Paired t test power calculation
n = 53.24355
So, it’s not great, but it’s at least not terribly wrong?
EDIT: Just realized that I equivocated over days vs pairs in my existing power analyses; 1 was wrong, but I apparently avoided the error in another, phew.
I’m wondering why 0.05 (alpha) was used in that formula? True positive and false negative rates depends on statistical power (1-beta) and beta, and in case of beta 0.2, rate of “Melatonin is working” in case of negative result is 0.457 (not a 0.1739)
“Melatonin is working” branch (prior P(W) = 0.8) have 2 possibilities True positive, P(“W”|W) = 1-b = 0.8 False negative, P(“~W”|W) = b = 0.2
“Melatonin is not working” branch (prior P(~W) = 0.2) have 2 possibilities False positive, P(“W”|~W) = a = 0.05 True negative, , P(“~W”|~W) = 1-a = 0.95
Put simply, VOI is the difference between your expected value with and without the information.
So with Melatonin, let’s simplify to 2 possibilities:
A) Melatonin has no effect, costs $10 per year, for a value of −1
B) Saves you 15 minutes per day (+5 utilons), costs $10 per year (-1 utilon), for a net value of +4 utilons.
Now, let’s say you think that A and B are equally likely. Then the expected value of not taking Melatonin is 0, and the expected value of taking it is 0.5 −1 + 0.5 4 = 1.5. With only this information available, you will always take Melatonin, so your expected value is 1.5.
Then let’s say you are considering a definitive experiment (so you will know with p=1 whether A or B is true).
If A is true then you will not take Melatonin, so the value of that outcome is 0 utilons.
If B is true, then you will take Melatonin, for a value of 4 utilons.
And by conservation of expected evidence, it is equally likely that the experiment will decide for A or B.*
Then the expected value of your decision with perfect info is 0.5 0 + 0.5 4 = 2 > 1.5, so the VOI is 0.5 utilons..
*Equally likely only because of how I set up the problem. Conservation of expected evidence would also be satisfied if the experiment would probably favor one side weakly, but improbably favor the other side strongly.
VOI is higher when the experiment shifts your beliefs a lot, lower when the expected change in belief is small. For example, praying is sufficiently unlikely to work that it’s not worth my time to test it. There are other cases where my uncertainty is high, but I can’t think of sufficiently good cheap experiments.
VOI is higher when you would gain a lot if it told you to change your plans. For example, if you would have taken Adderall without an experiment, and Adderall is expensive, then finding out it doesn’t work saves you a lot of money. This is less true for melatonin.
It’s certainly true that you wouldn’t be exploring things that didn’t have positive expected value, but wouldn’t the size of the expected value matter?
I don’t think it matters unless your investments are limited. If you have presented with X positive expected value investments, and you have enough funds for X+1 investments, what do you do? Invest in all X and reap the maximum possible return. (If you are limited to only 2 investments, then you will be very interested in which 2 investments of the X investments have the greatest sized expected value.)
This is pretty much the case with supplements: I don’t lack capital to invest in them (look at how cheap some of the examples are, like lithium or melatonin), I lack good candidates for investment!
Maybe, but my understanding is that that value is already being screened off: the something must be positive expected value in the first place, or you wouldn’t be using it at all in the first place.
(But I could be wrong, and I’ve already pinged Vaniver with a request to look things over since that’s the sort of basic conceptual confusion I couldn’t get myself out of.)
First off, kudos for discussing non-VoI reasons to run these experiments. Real decisions have many factors.
The eyeballed estimate of how much the experimental design reduces the value from perfect information should be replaced by a decision tree. If the experiment can’t give you enough data to change your position, then it’s not material.
Using the first example, where W is melatonin works and “W” is the experiment saying that melatonin works, it looks like you provided P(W)=.8, P(“W”|W)=.95, and P(“W”|~W)=.05. I assumed that >90% corresponded to a point estimate of 95%, and that the test was symmetric, which should get thought about more if you’re doing this seriously.
In the case where you get “W”, you update and P(W|”W”)=99% and you continue taking melatonin. But in the case where you get “~W”, you update and P(W|”~W”)=17%. Given the massive RoI you calculated for melatonin, it sounds like it’s worth taking even if there’s only a 17% chance that it’s actually effective. Rather than continuing blindly on, you’d probably continue the test until you had enough data to be sure / pin down your RoI calculation, but you should be able to map that out now before you start the experiment.
There’s a question of prior information here- from what you’re written, it sounds like you should be more than 80% sure that melatonin worked for you. You might be interested in a different question- “melatonin still works for me”- which it might be reasonable to have an 80% prior on. If the uncertainty is about the value of taking melatonin, it seems like you could design a better experiment that narrows your uncertainty there (by looking for cognitive costs, or getting a better estimate of time saved, etc.).
A brief terminology correction: the “value of perfect information” would be $41, not $205 (i.e. it includes the 20% estimate that melatonin doesn’t work). If you replace that with “value of a perfect negative result” you should be fine.
In 3, you’re considering adding a new supplement, not stopping a supplement you already use. The “I don’t try Adderall” case has value $0, the “Adderall fails” case is worth -$40 (assuming you only bought 10 pills, and this number should be increased by your analysis time and a weighted cost for potential permanent side effects), and the “Adderall succeeds” case is worth $X-40-4099, where $X is the discounted lifetime value of the increased productivity due to Adderall, minus any discounted long-term side effect costs. If you estimate Adderall will work with p=.5, then you should try out Adderall if you estimate that .5(X-4179)>0 → X>4179. (Adderall working or not isn’t binary, and so you might be more comfortable breaking down the various “how effective Adderall is” cases when eliciting X, by coming up with different levels it could work at, their values, and then using a weighted sum to get X. This can also give you a better target with your experiment- “this needs to show a benefit of at least Y from Adderall for it to be worth the cost, and I’ve designed it so it has a reasonable chance of showing that.”)
One thing to notice is that the default case matters a lot. This asymmetry is because you switch decisions in different possible worlds- when you would take Adderall but stop you’re in the world where Adderall doesn’t work, and when you wouldn’t take Adderall but do you’re in the world where Adderall does work (in the perfect information case, at least). One of the ways you can visualize this is that you don’t penalize tests for giving you true negative information, and you reward them for giving you true positive information. (This might be worth a post by itself, and is very Litany of Gendlin.)
The rest is similar. I definitely agree with the last line: possibly a way to drive it home is to talk about dividing by ln(1.05), which is essentially multiplying by 20.5. If you can make a one-time investment that pays off annually until you die, that’s worth 20.5 times the annual return, and multiplying the value of something by 20 can often move it from not worth thinking about to worth thinking about.
Thanks for the comments.
The Bayes calculation is
(0.05 * 0.8) / ((0.05 * 0.8) + (0.95 * 0.2)) = 0.1739...
, right? (A second experiment would knock it down to ~0.01, apparently.)I didn’t notice that. I didn’t realize I was making an assumption that on a negative experimental result, I’d immediately stop buying whatever. Now I suddenly remember the Wikipedia article talking about iterating… After I get one experimental result, I need to redo the expected-value calculation, and re-run the VoI on further experiments; sigh I guess I’d better reword the melatonin section and add a footnote to the master version explaining this!
I’ll reword that.
I’ll need to think about the Adderall point.
You’re welcome!
That’s how I did it.
It’s also possible that P(W|”~W”) is way lower than .05, and so the test could be better than that calculation makes it look. This is something you can figure out from basic stats and your experimental design, and I strongly recommend actually running the numbers. Psychology for years has been plagued with studies that are too small to actually provide valuable information, as people in general aren’t good intuitive statisticians.
As it happens, I learned how to do basic power calculations not that long ago. I didn’t do an explicit calculation for the melatonin trial because I didn’t randomize selection, instead doing an alternating days design and not always following that, so I thought why bother doing one in retrospect?
But if we were to wave that away, the power seems fine. I have something like 141 days of data, of which around 90-100 is usable, giving me maybe <50 pairs? If I fire up R and load in the two means and the standard deviation (which I had left over from calculating the effect size), and then play with the numbers, then to get an 85% chance I could find an effect at p=0.01:
If I drop the p=0.01 for 0.05, it looks like I should have had a good shot at detecting the effect:
So, it’s not great, but it’s at least not terribly wrong?
EDIT: Just realized that I equivocated over days vs pairs in my existing power analyses; 1 was wrong, but I apparently avoided the error in another, phew.
I’m wondering why 0.05 (alpha) was used in that formula? True positive and false negative rates depends on statistical power (1-beta) and beta, and in case of beta 0.2, rate of “Melatonin is working” in case of negative result is 0.457 (not a 0.1739)
“Melatonin is working” branch (prior P(W) = 0.8) have 2 possibilities
True positive, P(“W”|W) = 1-b = 0.8
False negative, P(“~W”|W) = b = 0.2
“Melatonin is not working” branch (prior P(~W) = 0.2) have 2 possibilities
False positive, P(“W”|~W) = a = 0.05
True negative, , P(“~W”|~W) = 1-a = 0.95
P(W|”~W”) = P(“~W”|W) * P(W) / (P(“~W”|W) * P(W) + P(“~W”|~W) * P(~W)) =
(0.2 * 0.8) / ((0.2 * 0.8) + (0.95 * 0.2)) = 0.457, not 0.1739 (~3 fold difference)
I’m a bit confused because i’m getting different results, but maybe i’m wrong and someone can correct me?
I’m planning to make blind experiment with melatonin, but want to learn more stats and better understand VOI, before i start
UPDATE: Math corrected. thanks!
Put simply, VOI is the difference between your expected value with and without the information.
So with Melatonin, let’s simplify to 2 possibilities:
A) Melatonin has no effect, costs $10 per year, for a value of −1
B) Saves you 15 minutes per day (+5 utilons), costs $10 per year (-1 utilon), for a net value of +4 utilons.
Now, let’s say you think that A and B are equally likely. Then the expected value of not taking Melatonin is 0, and the expected value of taking it is 0.5 −1 + 0.5 4 = 1.5. With only this information available, you will always take Melatonin, so your expected value is 1.5.
Then let’s say you are considering a definitive experiment (so you will know with p=1 whether A or B is true).
If A is true then you will not take Melatonin, so the value of that outcome is 0 utilons.
If B is true, then you will take Melatonin, for a value of 4 utilons.
And by conservation of expected evidence, it is equally likely that the experiment will decide for A or B.*
Then the expected value of your decision with perfect info is 0.5 0 + 0.5 4 = 2 > 1.5, so the VOI is 0.5 utilons..
*Equally likely only because of how I set up the problem. Conservation of expected evidence would also be satisfied if the experiment would probably favor one side weakly, but improbably favor the other side strongly.
So what should you conclude from this?
VOI is higher when the experiment shifts your beliefs a lot, lower when the expected change in belief is small. For example, praying is sufficiently unlikely to work that it’s not worth my time to test it. There are other cases where my uncertainty is high, but I can’t think of sufficiently good cheap experiments.
VOI is higher when you would gain a lot if it told you to change your plans. For example, if you would have taken Adderall without an experiment, and Adderall is expensive, then finding out it doesn’t work saves you a lot of money. This is less true for melatonin.
Expected value of taking M without information is 0.5 −1 + 0.5 4 = 1.5, not 1. VoI in this case is 0.5 utilon.
It’s certainly true that you wouldn’t be exploring things that didn’t have positive expected value, but wouldn’t the size of the expected value matter?
I don’t think it matters unless your investments are limited. If you have presented with X positive expected value investments, and you have enough funds for X+1 investments, what do you do? Invest in all X and reap the maximum possible return. (If you are limited to only 2 investments, then you will be very interested in which 2 investments of the X investments have the greatest sized expected value.)
This is pretty much the case with supplements: I don’t lack capital to invest in them (look at how cheap some of the examples are, like lithium or melatonin), I lack good candidates for investment!