gjm comments on Putanumonit—Bayesian inference vs. null hypothesis testing

gjm 23 Jan 2017 0:41 UTC
4 points
I think there’s something wrong with your analysis of the longer/shorter survey data.

[EDITED to add:] … and, having written this and gone back to read the comments on your post, I see that someone there has already said almost exactly the same as I’m saying here. Oh well.

You start out by saying that you should write longer posts if 25% more readers prefer long than prefer short (and similarly for writing shorter posts).

Then you consider three hypotheses: that (as near as possible to) exactly 25% more prefer long than prefer short, that (as near as possible to) exactly 25% more prefer short, and that the numbers preferring long and preferring short are equal.

And you establish that your posterior probability for the first of those is much bigger than for either of the others, and say

Our simple analysis led us to an actionable conclusion: there’s a 97% chance that the preference gap in favor longer posts is closer to 25% than to 0%, so I shouldn’t hesitate to write longer posts.

Everything before the last step is fine (though, as you do remark explicitly, it would be better to consider a continuous range of hypotheses about the preference gap). But surely the last step is just wrong in at least two ways.
- You can’t get from “preference gap of exactly 25% is much more likely than preference gap of exactly 0%” to “preference gap of at least 12.5% is much more likely than preference gap of at most 12.5%”.
- The original question wasn’t whether the preference gap is at least 12.5%, it was whether it’s at least 25%.
With any reasonable prior, I think the data you have make it extremely unlikely that the preference gap is at least 25%.

[EDITED to add:] Oh, one other thing I meant to say but forgot (which, unlike the above, hasn’t already been said in comments on your blog). The assumption being made here is, roughly, that people responding to the survey are a uniform random sample from all your readers. But I bet they aren’t. In particular, I bet more “engaged” readers are (1) more likely to respond to the survey and (2) more likely to prefer longer meatier posts. So I bet the real preference gap among your whole readership is smaller than the one found in the survey. Of course you may actually prefer to optimize for the experience of your more engaged readers, but again that isn’t what you said you wanted to do :-).
- Jacob Falkovich 23 Jan 2017 16:11 UTC
  1 point
  Parent
  Since at 4,000 words the post was running up against the limits of my stamina regardless of readers’ preferences, I trust my smart and engaged readers to make all the necessary nitpicks and caveats for me :)
  
  First of all, according to the site stats more than 80% of the people who read the survey filled it out so it makes sense to treat it as representative sample. I forgot to mention that.
  
  To your first point: you’re correct that “the real gap is almost certainly above 12.5%” isn’t exactly what my posterior is. Again, my goal was to make a decision, so I had to assign decisions based on what the data could show me. I don’t need to have a precise interpretation of the result to make a sensible decision based on them, as long as I’m not horribly mistaken about what the results mean.
  
  And what the results mean is, in fact, pretty close to “the real gap is almost certainly above 12.5%” under some reasonable assumptions. Whatever the “real” gap (i.e. the gap I would get if I got an answer from every single one of my current and future readers), the possible gaps I could measure on the survey are almost certainly distributed in some unimodal and pretty symmetric distribution around it. This means that the measured results are about as likely to overshoot the “real gap” by x% as they are to undershoot, at least to a first approximation (i.e. ignoring things like how the question was worded and the phase of the moon). This in turn means that a measured result of a 15% gap on a large sample of reasers does imply that the “real gap” is very likely to be close to 15% and above 12.5%.
  
  Thanks for taking the time to dig into the math, this is what it’s about.
  - gjm 23 Jan 2017 21:26 UTC
    0 points
    Parent
    Wait, if 80% of your readers took the survey then why on earth are you doing any kind of fancy statistics to estimate that preference gap? If you’ve got a representative sample that covers a large majority of your readers, then you know what the gap is: it’s the gap observed in the sample, which IIRC was a little under 15%. Done.
    
    (The factor-of-2 change in the meaning of that 25% figure seems really strange to me, too, but I take it the issue is just that the way it was introduced didn’t mean what I thought it did.)
  - Vaniver 23 Jan 2017 19:15 UTC
    0 points
    Parent
    
    Again, my goal was to make a decision, so I had to assign decisions based on what the data could show me.
    
    It seems to me like you came up with a sensible metric for determining whether posts should be made longer or shorter, conditional on the post length changing, but that it would be better to determine also whether or not post length should change. That’s sort of what the 25% cutoff was pointing at, but note that it doesn’t distinguish between the world where it’s split 60-5-35 (for longer-same-shorter) and the world where it’s split 25-75-0. The first world looks like it needs you to split out your readership and figure out what the subgroups are, and the second world looks like you should moderately increase post length.
    
    (Of course, to actually get the right decision you also need the cost estimate for being too long vs. too short; one might assume that you should tinker with the length until the two unhappy groups are equally sized, but this rests on an assumption that is often wrong.)