If you combine a low noise signal with a high noise signal the combined signal can be of medium noise. Combining information isn’t always useful if you want to use both signal as proxy for the same thing.
Agreed that if you have P(A|B) and P(A|C), then you don’t have enough to get P(A|BC).
But if you have the right objects and they’re well-calibrated, then adding in a new measurement always improves your estimate. (You might not be sure that they’re well-calibrated, in which case it might make sense to not include them, and that can obviously include trying to estimate P(A|BC) from P(A|C) and P(A|B).)
For combining information in such a way you would have to believe that the average black with a IQ of 120 will get a higher GPA score than the average white person of the same IQ.
Not quite. Regression to the mean implies that you should apply shrinkage which is as specific as possible, but this shrinkage should obviously be applied to all applicants. (Regressing black scores to the mean, and not regressing white scores, for example, is obviously epistemic malfeasance, but regressing black scores to the black mean and white scores to the white mean makes sense, even if the IQ-grades relationship is the same for blacks and whites.)
It could also be that the GPA-job performance link is different for whites and blacks, even if the IQ-GPA link is the same for whites and blacks. (And, of course, race could impact job performance directly, but it seems likely the effects should be indirect for almost all jobs.)
I think there little reason to believe that’s true.
If you’re just comparing GPAs, rather than GPAs weighted by course difficulty, there could be a systematic difference in the difficulty of classes that applicants take by race. I’ve had a hard time getting numerical data on this, for obvious reasons, but there are rumors that some institutions may have a grade bias in favor of blacks. (Obviously, you can’t fit a parameter to a rumor, but this is reason to not discount an effect that you do see in your data.)
Simple models often outperform more complicated ones.
Yes, but… motivated cognition alert. If you’re building models correctly, you take this into account by default, and so there’s no point in bringing it up for any particular input because you should already be checking it for every input.
Agreed that if you have P(A|B) and P(A|C), then you don’t have enough to get P(A|BC).
But if you have the right objects and they’re well-calibrated, then adding in a new measurement always improves your estimate. (You might not be sure that they’re well-calibrated, in which case it might make sense to not include them, and that can obviously include trying to estimate P(A|BC) from P(A|C) and P(A|B).)
Not quite. Regression to the mean implies that you should apply shrinkage which is as specific as possible, but this shrinkage should obviously be applied to all applicants. (Regressing black scores to the mean, and not regressing white scores, for example, is obviously epistemic malfeasance, but regressing black scores to the black mean and white scores to the white mean makes sense, even if the IQ-grades relationship is the same for blacks and whites.)
It could also be that the GPA-job performance link is different for whites and blacks, even if the IQ-GPA link is the same for whites and blacks. (And, of course, race could impact job performance directly, but it seems likely the effects should be indirect for almost all jobs.)
If you’re just comparing GPAs, rather than GPAs weighted by course difficulty, there could be a systematic difference in the difficulty of classes that applicants take by race. I’ve had a hard time getting numerical data on this, for obvious reasons, but there are rumors that some institutions may have a grade bias in favor of blacks. (Obviously, you can’t fit a parameter to a rumor, but this is reason to not discount an effect that you do see in your data.)
Yes, but… motivated cognition alert. If you’re building models correctly, you take this into account by default, and so there’s no point in bringing it up for any particular input because you should already be checking it for every input.