Essentially getting good grades and having a desk in your room are apparently good predictors of whether you want to go to university or not. The former seemed sensible, the latter seemed like it shouldn’t have a big effect size but I wanted to give it a chance.
Just from the abstract you can tell there are at least 8 input variables so the numerator on Lehr’s equation becomes ~26. This means a cohen’s d of 0.1 (which I feel is pretty generous for having a desk in your room) would require 2600 results in each sample.
As the samples are unlikely to be of equal size, I would estimate they would need a total of ~10,000 samples for this to have any chance of finding a meaningful result for smaller effect sizes.
The actual number of samples was ~1,000. At this point I would normally write off the study without bothering to go deeper, the process taking less than 5 minutes.
I was curious to see how they managed to get multiple significant results despite the sample size limitations. It turns out that they decided against reporting p-values because “we could no longer assume randomness of the sample”. Instead they report the odds ratio of each result and said that anything with a large ratio had an effect, ignoring any uncertainty of the results.
It turns out there were only 108 students in the no-desk sample. Definitely what Andrew Gelman calls a Kangaroo measurement.
There are a lot of other problems with the paper but just looking at the sample size (even though the sample size was ~1,000) was a helpful check to confidently reject the paper with minimal effort.
I just came across an example of this which might be helpful.
Essentially getting good grades and having a desk in your room are apparently good predictors of whether you want to go to university or not. The former seemed sensible, the latter seemed like it shouldn’t have a big effect size but I wanted to give it a chance.
The paper itself is here.
Just from the abstract you can tell there are at least 8 input variables so the numerator on Lehr’s equation becomes ~26. This means a cohen’s d of 0.1 (which I feel is pretty generous for having a desk in your room) would require 2600 results in each sample.
As the samples are unlikely to be of equal size, I would estimate they would need a total of ~10,000 samples for this to have any chance of finding a meaningful result for smaller effect sizes.
The actual number of samples was ~1,000. At this point I would normally write off the study without bothering to go deeper, the process taking less than 5 minutes.
I was curious to see how they managed to get multiple significant results despite the sample size limitations. It turns out that they decided against reporting p-values because “we could no longer assume randomness of the sample”. Instead they report the odds ratio of each result and said that anything with a large ratio had an effect, ignoring any uncertainty of the results.
It turns out there were only 108 students in the no-desk sample. Definitely what Andrew Gelman calls a Kangaroo measurement.
There are a lot of other problems with the paper but just looking at the sample size (even though the sample size was ~1,000) was a helpful check to confidently reject the paper with minimal effort.