Women were on average newer to the community − 21 months vs. 39 for men—but to my surprise a t-test was unable to declare this significant. Maybe I’m doing it wrong?
Well, possibly. The t-distribution is used for “estimating the mean of a normally distributed population,” (yay wikipedia) and you’re trying to estimate the mean of a slanted-uniformly-distributed-with-a-spike-at-the-beginning population.
But there is another important consideration, which is that applying more scrutiny to unexpected results gives you systematic error (confirmation bias), and that’s bad. To avoid this big problem, any increase in test quality should probably be part of a wholesale reanalysis, i.e. prolly not gonna happen. But there is another route, which is just accepting that your results are imperfect and widening your mental error bars. After all, where does this systematic error come from when you re-analyze unexpected results? It comes from you making mistakes on other things too, but not re-analyzing them! So once you know about the systematic error, you also know about all these other mistakes you have on average made :P
Well, possibly. The t-distribution is used for “estimating the mean of a normally distributed population,” (yay wikipedia) and you’re trying to estimate the mean of a slanted-uniformly-distributed-with-a-spike-at-the-beginning population.
Yeah, it’d have to be some combination of a uniform Poisson (since we don’t seem to be growing a lot, per Yvain) and an exponential distribution (constant mortality of users). If we graph histograms, either blunt or finegrained, it looks like that but also with weird huge spikes besides the original OB->LW spike:
Well, possibly. The t-distribution is used for “estimating the mean of a normally distributed population,” (yay wikipedia) and you’re trying to estimate the mean of a slanted-uniformly-distributed-with-a-spike-at-the-beginning population.
But there is another important consideration, which is that applying more scrutiny to unexpected results gives you systematic error (confirmation bias), and that’s bad. To avoid this big problem, any increase in test quality should probably be part of a wholesale reanalysis, i.e. prolly not gonna happen. But there is another route, which is just accepting that your results are imperfect and widening your mental error bars. After all, where does this systematic error come from when you re-analyze unexpected results? It comes from you making mistakes on other things too, but not re-analyzing them! So once you know about the systematic error, you also know about all these other mistakes you have on average made :P
Yeah, it’d have to be some combination of a uniform Poisson (since we don’t seem to be growing a lot, per Yvain) and an exponential distribution (constant mortality of users). If we graph histograms, either blunt or finegrained, it looks like that but also with weird huge spikes besides the original OB->LW spike:
But on the plus side, if we look at the genders as a box plot, we discover why the mean is lower for women but there’s not significance:
There are, after all, many fewer women.
The spikes are just due to people estimating in half-years: 12, 18, 24, 30, 36.