bentarm comments on Simpson’s Paradox

bentarm 14 Jan 2011 1:09 UTC
4 points
Let’s say the only data we’d collected were gender and whether or not the patient’s birthday was a Tuesday. Do you really think there is something to be gained from building four separate models now?

More seriously, if you collect enough information, then purely by chance there will be some partitioning of the data which gives the wrong conclusion.

I don’t think we disagree on anything important here—the main point is that you need to be careful when choosing which partitions of the data you use—arbitrarily partitioning along every available divide is not optimal.

PS—thanks for the typo correction, I really need to learn to proofread...