Great article overall. Regression to the mean is a key fact of statistics, and far too few people incorporate it into their intuition.
But there’s a key misunderstanding in the second-to-last graph (the one with the drawn-in blue and red “outcome” and “factor”). The black line, indicating a correlation of 1, corresponds to nothing in reality. The true correlation is the line from the vertical tangent point at the right (marked) to the vertical tangent point at the left (unmarked). If causality indeed runs from “factor” (height) to “outcome” (skill), that’s how much extra skill an extra helping of height will give you. Thus, the diagonal red line should follow this direction, not be parallel to the 45 degree black line. If you draw this line, you’ll notice that each point on it has equal vertical distance to the top and bottom of the elliptical “envelope” (which is, of course, not a true envelope for all the probability mass, just an indication that probability density is higher for any point inside than any point outside).
Things are a little more complex if the correlation is due to a mutual cause, “reverse” causation (from “outcome” to “factor”), or if “factor” is imperfectly measured. In that case, the line connecting the vertical tangents may not correspond to anything in reality, though it’s still what you should follow to get the “right” (minimum expected squared error) answer.
This may seem to be a nitpick, but to me, this kind of precision is key to getting your intuition right.
Thanks for this important spot—I don’t think it is a nitpick at all. I’m switching jobs at the moment, but I’ll revise the post (and diagrams) in light of this. It might be a week though, sorry!
Great article overall. Regression to the mean is a key fact of statistics, and far too few people incorporate it into their intuition.
But there’s a key misunderstanding in the second-to-last graph (the one with the drawn-in blue and red “outcome” and “factor”). The black line, indicating a correlation of 1, corresponds to nothing in reality. The true correlation is the line from the vertical tangent point at the right (marked) to the vertical tangent point at the left (unmarked). If causality indeed runs from “factor” (height) to “outcome” (skill), that’s how much extra skill an extra helping of height will give you. Thus, the diagonal red line should follow this direction, not be parallel to the 45 degree black line. If you draw this line, you’ll notice that each point on it has equal vertical distance to the top and bottom of the elliptical “envelope” (which is, of course, not a true envelope for all the probability mass, just an indication that probability density is higher for any point inside than any point outside).
Things are a little more complex if the correlation is due to a mutual cause, “reverse” causation (from “outcome” to “factor”), or if “factor” is imperfectly measured. In that case, the line connecting the vertical tangents may not correspond to anything in reality, though it’s still what you should follow to get the “right” (minimum expected squared error) answer.
This may seem to be a nitpick, but to me, this kind of precision is key to getting your intuition right.
Thanks for this important spot—I don’t think it is a nitpick at all. I’m switching jobs at the moment, but I’ll revise the post (and diagrams) in light of this. It might be a week though, sorry!
Bump.
(I realize you’re busy, this is just a friendly reminder.)
Also, I added one clause to my comment above: the bit about “imperfectly measured”, which is of course usually the case in the real world.
Belatedly updated. Thanks for your helpful comments!