Thanks for the link! That seems like what I want; for example, I didn’t have any problem plugging in my placebo/D ZQ scores to get a one-tailed p = 0.078395.
you might want to do a multiple comparisons correction
The only one I know is the Bonferroni one, but that’s for independent tests, IIRC, while I strongly expect correlations among the results (ZQ is made partially out of things like REM and deep sleep length, so there’d be correlations by definition, and one would expect my sleep quality rating to correlate with ZQ even assuming that’s not being factored into ZQ already).
Reading Wikipedia, I get the impression that using Bonferroni when I know the tests to not be independent would result in fewer false positives, but many many more false negatives. Since my data has so little power as it is...
Yes, that’s a good point. I suggest that if you are testing many hypothesis, you use the false discovery rate (here’s the useful, original pdf, cited 10,000+ times).
As an example, let’s say that you test 6 hypotheses, corresponding to different features of your zeo data. You could use a t-test for each, as above. Then aggregate and sort all the p-values in ascending order. Let’s say that they are 0.001, 0.013, 0.021, 0.030, 0.067, and 0.134.
Assume, arbitrarily, that you want the overall false discovery rate to be 0.05, which is in this context called the q-value. You would then sequentially test, from the last value to the first, whether the current p-value is less than ((the current index * the false discovery rate) / the overall number of hypotheses). You stop when you get to the first true inequality and call the p-values of the rest of the hypotheses significant.
So in this example, you would stop when you correctly call 0.030 < ((4 * 0.05) / 6), and hypotheses corresponding to the first four p-values would be called significant.
Interesting procedure. I tried it out on my melatonin and one-legged standing data, putting the results in the same footnotes as the R sessions, and no surprise, nothing survives. (A little depressing, but it’s not like there were very many p-values in the 0.01 or lower range.)
EDIT: however, one result from my vitamin D experiment did survive multiple correction!
Thanks for the link! That seems like what I want; for example, I didn’t have any problem plugging in my placebo/D ZQ scores to get a one-tailed p = 0.078395.
The only one I know is the Bonferroni one, but that’s for independent tests, IIRC, while I strongly expect correlations among the results (ZQ is made partially out of things like REM and deep sleep length, so there’d be correlations by definition, and one would expect my sleep quality rating to correlate with ZQ even assuming that’s not being factored into ZQ already).
Reading Wikipedia, I get the impression that using Bonferroni when I know the tests to not be independent would result in fewer false positives, but many many more false negatives. Since my data has so little power as it is...
Yes, that’s a good point. I suggest that if you are testing many hypothesis, you use the false discovery rate (here’s the useful, original pdf, cited 10,000+ times).
As an example, let’s say that you test 6 hypotheses, corresponding to different features of your zeo data. You could use a t-test for each, as above. Then aggregate and sort all the p-values in ascending order. Let’s say that they are 0.001, 0.013, 0.021, 0.030, 0.067, and 0.134.
Assume, arbitrarily, that you want the overall false discovery rate to be 0.05, which is in this context called the q-value. You would then sequentially test, from the last value to the first, whether the current p-value is less than ((the current index * the false discovery rate) / the overall number of hypotheses). You stop when you get to the first true inequality and call the p-values of the rest of the hypotheses significant.
So in this example, you would stop when you correctly call 0.030 < ((4 * 0.05) / 6), and hypotheses corresponding to the first four p-values would be called significant.
Interesting procedure. I tried it out on my melatonin and one-legged standing data, putting the results in the same footnotes as the R sessions, and no surprise, nothing survives. (A little depressing, but it’s not like there were very many p-values in the 0.01 or lower range.)
EDIT: however, one result from my vitamin D experiment did survive multiple correction!