Admittedly, if I ask R to run a Lilliefors test, the test rejects the hypothesis of normality (p = 0.0007), and it remains the case that the donations are neither log-normal nor power-law distributed because some of the values are zero.
As I understand it, tests of normality are not all that useful because: they are underpowered & won’t reject normality at the small samples where you need to know about non-normality because it’ll badly affect your conclusions; and at larger samples like the LW survey, because real-world data is rarely exactly normal, they will always reject normality even when it makes not the slightest difference to your results (because the sample is now large enough to benefit from the asymptotics and various robustnesses).
When I was looking at donations vs EA status earlier this year, I just added +1 to remove the zero-inflation, and then logged donation amount. Seemed to work well. A zero-inflated log-normal might have worked even better.
Also, you don’t have to look at only one year’s data; you can look at 3 or 4 by making sure to filter out responses based whether they report answering a previous survey.
As I understand it, tests of normality are not all that useful because: they are underpowered & won’t reject normality at the small samples where you need to know about non-normality because it’ll badly affect your conclusions; and at larger samples [...], because real-world data is rarely exactly normal, they will always reject normality even when it makes not the slightest difference to your results
I agree that normality tests are too insensitive for most small samples, and too sensitive for pretty much any big sample, but I’d presumed there was a sweet spot (when the sample size is a few hundred) where normality tests have decent sensitivity without giving everything a negligible p-value, and that the LW survey is near that sweet spot. If I’d been lazy and used R’s out-of-the-box normality test (Shapiro-Wilk) instead of following goocy’s recommendation (Lilliefors, which R hides in its nortest library) I’d have got an insignificant p of 0.11, so the sample [edit: of non-zero donations] evidently isn’t large enough to guarantee rejection by normality tests in general.
Also, you don’t have to look at only one year’s data; you can look at 3 or 4 by making sure to filter out responses based whether they report answering a previous survey.
Certainly. It might be interesting to investigate whether the log-normal-with-zeroes distribution holds up in earlier years, and if so, whether the distribution’s parameters drift over time. Still, goocy’s complaint was about 2014′s data, so I stuck with that.
As I understand it, tests of normality are not all that useful because: they are underpowered & won’t reject normality at the small samples where you need to know about non-normality because it’ll badly affect your conclusions; and at larger samples like the LW survey, because real-world data is rarely exactly normal, they will always reject normality even when it makes not the slightest difference to your results (because the sample is now large enough to benefit from the asymptotics and various robustnesses).
When I was looking at donations vs EA status earlier this year, I just added +1 to remove the zero-inflation, and then logged donation amount. Seemed to work well. A zero-inflated log-normal might have worked even better.
Also, you don’t have to look at only one year’s data; you can look at 3 or 4 by making sure to filter out responses based whether they report answering a previous survey.
I agree that normality tests are too insensitive for most small samples, and too sensitive for pretty much any big sample, but I’d presumed there was a sweet spot (when the sample size is a few hundred) where normality tests have decent sensitivity without giving everything a negligible p-value, and that the LW survey is near that sweet spot. If I’d been lazy and used R’s out-of-the-box normality test (Shapiro-Wilk) instead of following goocy’s recommendation (Lilliefors, which R hides in its
nortest
library) I’d have got an insignificant p of 0.11, so the sample [edit: of non-zero donations] evidently isn’t large enough to guarantee rejection by normality tests in general.Certainly. It might be interesting to investigate whether the log-normal-with-zeroes distribution holds up in earlier years, and if so, whether the distribution’s parameters drift over time. Still, goocy’s complaint was about 2014′s data, so I stuck with that.