Since at 4,000 words the post was running up against the limits of my stamina regardless of readers’ preferences, I trust my smart and engaged readers to make all the necessary nitpicks and caveats for me :)
First of all, according to the site stats more than 80% of the people who read the survey filled it out so it makes sense to treat it as representative sample. I forgot to mention that.
To your first point: you’re correct that “the real gap is almost certainly above 12.5%” isn’t exactly what my posterior is. Again, my goal was to make a decision, so I had to assign decisions based on what the data could show me. I don’t need to have a precise interpretation of the result to make a sensible decision based on them, as long as I’m not horribly mistaken about what the results mean.
And what the results mean is, in fact, pretty close to “the real gap is almost certainly above 12.5%” under some reasonable assumptions. Whatever the “real” gap (i.e. the gap I would get if I got an answer from every single one of my current and future readers), the possible gaps I could measure on the survey are almost certainly distributed in some unimodal and pretty symmetric distribution around it. This means that the measured results are about as likely to overshoot the “real gap” by x% as they are to undershoot, at least to a first approximation (i.e. ignoring things like how the question was worded and the phase of the moon). This in turn means that a measured result of a 15% gap on a large sample of reasers does imply that the “real gap” is very likely to be close to 15% and above 12.5%.
Thanks for taking the time to dig into the math, this is what it’s about.
Wait, if 80% of your readers took the survey then why on earth are you doing any kind of fancy statistics to estimate that preference gap? If you’ve got a representative sample that covers a large majority of your readers, then you know what the gap is: it’s the gap observed in the sample, which IIRC was a little under 15%. Done.
(The factor-of-2 change in the meaning of that 25% figure seems really strange to me, too, but I take it the issue is just that the way it was introduced didn’t mean what I thought it did.)
Again, my goal was to make a decision, so I had to assign decisions based on what the data could show me.
It seems to me like you came up with a sensible metric for determining whether posts should be made longer or shorter, conditional on the post length changing, but that it would be better to determine also whether or not post length should change. That’s sort of what the 25% cutoff was pointing at, but note that it doesn’t distinguish between the world where it’s split 60-5-35 (for longer-same-shorter) and the world where it’s split 25-75-0. The first world looks like it needs you to split out your readership and figure out what the subgroups are, and the second world looks like you should moderately increase post length.
(Of course, to actually get the right decision you also need the cost estimate for being too long vs. too short; one might assume that you should tinker with the length until the two unhappy groups are equally sized, but this rests on an assumption that is often wrong.)
Since at 4,000 words the post was running up against the limits of my stamina regardless of readers’ preferences, I trust my smart and engaged readers to make all the necessary nitpicks and caveats for me :)
First of all, according to the site stats more than 80% of the people who read the survey filled it out so it makes sense to treat it as representative sample. I forgot to mention that.
To your first point: you’re correct that “the real gap is almost certainly above 12.5%” isn’t exactly what my posterior is. Again, my goal was to make a decision, so I had to assign decisions based on what the data could show me. I don’t need to have a precise interpretation of the result to make a sensible decision based on them, as long as I’m not horribly mistaken about what the results mean.
And what the results mean is, in fact, pretty close to “the real gap is almost certainly above 12.5%” under some reasonable assumptions. Whatever the “real” gap (i.e. the gap I would get if I got an answer from every single one of my current and future readers), the possible gaps I could measure on the survey are almost certainly distributed in some unimodal and pretty symmetric distribution around it. This means that the measured results are about as likely to overshoot the “real gap” by x% as they are to undershoot, at least to a first approximation (i.e. ignoring things like how the question was worded and the phase of the moon). This in turn means that a measured result of a 15% gap on a large sample of reasers does imply that the “real gap” is very likely to be close to 15% and above 12.5%.
Thanks for taking the time to dig into the math, this is what it’s about.
Wait, if 80% of your readers took the survey then why on earth are you doing any kind of fancy statistics to estimate that preference gap? If you’ve got a representative sample that covers a large majority of your readers, then you know what the gap is: it’s the gap observed in the sample, which IIRC was a little under 15%. Done.
(The factor-of-2 change in the meaning of that 25% figure seems really strange to me, too, but I take it the issue is just that the way it was introduced didn’t mean what I thought it did.)
It seems to me like you came up with a sensible metric for determining whether posts should be made longer or shorter, conditional on the post length changing, but that it would be better to determine also whether or not post length should change. That’s sort of what the 25% cutoff was pointing at, but note that it doesn’t distinguish between the world where it’s split 60-5-35 (for longer-same-shorter) and the world where it’s split 25-75-0. The first world looks like it needs you to split out your readership and figure out what the subgroups are, and the second world looks like you should moderately increase post length.
(Of course, to actually get the right decision you also need the cost estimate for being too long vs. too short; one might assume that you should tinker with the length until the two unhappy groups are equally sized, but this rests on an assumption that is often wrong.)