Group level information is still useful for shrinkage of estimates and correcting for the always-present unreliability in individual estimates; see for example the long conversation between me and Vaniver on LW somewhere where we work through how you would shrink males and females’ SAT scores based on the College Board’s published reliability numbers.
And E(X’s deserved SAT score|X’s measured SAT score; X is male) - E(X’s deserved SAT score|X’s measured SAT score; X is female) was, like, four points? I still think people’s System 1 are likely to overestimate this difference if they know about the correlation more than underestimate it if they don’t.
And E(X’s deserved SAT score|X’s measured SAT score; X is male) - E(X’s deserved SAT score|X’s measured SAT score; X is female) was, like, four points?
A 4 point adjustment (or more) across all candidates based solely on 1 binary variable (gender) and a trivial centuries-old bit of statistical reasoning seems like a fairly impressive output, and likely to make a difference on the margin for thousands of applications out of the millions sent each year.
I still think people’s System 1 are likely to overestimate this difference if they know about the correlation more than underestimate it if they don’t.
And E(X’s deserved SAT score|X’s measured SAT score; X is male) - E(X’s deserved SAT score|X’s measured SAT score; X is female) was, like, four points?
For applicants scoring 800 on a hypothetical normally distributed math SAT, yep. For normally distributed tests, shrinkage is linear based on the difference between the group mean and the measured mean, and so it’s smaller for less extreme scores.
(For some reason, I’m having difficulty finding the link to the actual conversation; I think Google search is not going into deep comment threads, and the search function is based off the site, rather than just a database of my comments. Anyone remember helpful keywords further upthread to get a link to the actual conversation?)
I still think people’s System 1 are likely to overestimate this difference if they know about the correlation more than underestimate it if they don’t.
Saying “we shouldn’t explicitly calculate something because some people might implicitly calculate that thing incorrectly” sounds to me like going in the exact wrong direction.
(For some reason, I’m having difficulty finding the link to the actual conversation; I think Google search is not going into deep comment threads, and the search function is based off the site, rather than just a database of my comments. Anyone remember helpful keywords further upthread to get a link to the actual conversation?)
Group level information is still useful for shrinkage of estimates and correcting for the always-present unreliability in individual estimates; see for example the long conversation between me and Vaniver on LW somewhere where we work through how you would shrink males and females’ SAT scores based on the College Board’s published reliability numbers.
And E(X’s deserved SAT score|X’s measured SAT score; X is male) - E(X’s deserved SAT score|X’s measured SAT score; X is female) was, like, four points? I still think people’s System 1 are likely to overestimate this difference if they know about the correlation more than underestimate it if they don’t.
A 4 point adjustment (or more) across all candidates based solely on 1 binary variable (gender) and a trivial centuries-old bit of statistical reasoning seems like a fairly impressive output, and likely to make a difference on the margin for thousands of applications out of the millions sent each year.
And your evidence for this is...?
For applicants scoring 800 on a hypothetical normally distributed math SAT, yep. For normally distributed tests, shrinkage is linear based on the difference between the group mean and the measured mean, and so it’s smaller for less extreme scores.
(For some reason, I’m having difficulty finding the link to the actual conversation; I think Google search is not going into deep comment threads, and the search function is based off the site, rather than just a database of my comments. Anyone remember helpful keywords further upthread to get a link to the actual conversation?)
Saying “we shouldn’t explicitly calculate something because some people might implicitly calculate that thing incorrectly” sounds to me like going in the exact wrong direction.
Here is Wei Dai’s tool for searching LW comments.