Friday’s post on statistical bias and the bias-variance decomposition discussed how the squared error of an estimator equals the directional error of the estimator plus the variance of the estimator. All else being equal, bias is bad—you want to get rid of it. But all else is not always equal. Sometimes, by accepting a small amount of bias in your estimator, you can eliminate a large amount of variance. This is known as the “bias-variance tradeoff”.
A linear regression tries to estimate a quantity by attaching weights to various signals associated with that quantity—for example, you could try to predict the gas mileage of a car using the car’s mass and engine capacity.
A regularized linear regression tries to attach smaller variable weights, while still matching the data fairly well. A regularized regression may generalize to unseen data better than an unregularized regression—often quite a lot better. Assigning smaller variable weights is akin to finding a simpler explanation that fits the data almost as well. This drive for simplicity makes the regressor less sensitive to small random wobbles in the data, so it has lower variance: if you ran the regressor over different data samples, the estimates would look more similar to each other.
But the same regularization procedure also causes the estimator to ignore some actual data—and this is a systematic error, that would recur in the same direction if we repeated the experiment many times. The randomness goes in both directions, so by ignoring the noise in the data, you decrease your variance. But the real evidence goes in one direction, so if you ignore some real evidence in the process of ignoring noise—because you don’t know which is which—then you end up with a directional error, an error that trends in the same direction when you repeat the experiment many times.
In statistics this is known as the bias-variance tradeoff. When your data is limited, it may be better to use a simplifying estimator that doesn’t try to fit every tiny squiggle of the data, and this trades off a lot of variance against a little bias.
An “unbiased estimator” is one whose expected result equals the correct result, although it may have wide random swings in either direction. This is good if you are allowed to repeat the experiment as often as you like, because you can average together the estimates and get the correct answer to arbitrarily fine precision. That’s the law of large numbers.
You might have the following bright idea—why not use an unbiased estimator, like an unregularized regression, to guess the bias of a regularized regression? Then you could just subtract out the systematic bias—you could have low bias and low variance. The problem with this, you see, is that while it may be easy to find an unbiased estimator of the bias, this estimate may have very large variance—so if you subtract out an estimate of the systematic bias, you may end up subtracting out way too much, or even subtracting in the wrong direction a fair fraction of the time. In statistics, “unbiased” is not the same as “good”, unless the estimator also has low variance.
When you hear that a classroom gave an average estimate of 871 beans
for a jar that contained 850 beans, and that only one individual
student did better than the crowd, the astounding notion is not that
the crowd can be more accurate than the individual. The astounding
notion is that human beings are unbiased estimators of beans in a jar,
having no significant directional error on the problem, yet with large
variance. It implies that we tend to get the answer wrong but there’s
no systematic reason why. It requires that there be lots of errors
that vary from individual to individual—and this is reliably true,
enough so to keep most individuals from guessing the jar correctly.
And yet there are no directional errors that everyone makes, or if
there are, they cancel out very precisely in the average case, despite
the large individual variations. Which is just plain odd. I
find myself somewhat suspicious of the claim, and wonder whether other
experiments that found less amazing accuracy were not as popularly
reported.
Someone is bound to suggest that cognitive biases are useful, in the sense that they represent a bias-variance tradeoff. I think this is just mixing up words—just because the word “bias” is used by two different fields doesn’t mean it has the same technical definition. When we accept a statistical bias in trade, we can’t get strong information about the direction and magnitude of the bias—otherwise we would just subtract it out. We may be able to get an unbiased estimate of the bias, but “unbiased” is not the same as “reliable”; if the variance is huge, we really have very little information.
Now with cognitive biases, we do have some idea of the direction of the systematic error, and the whole notion of “overcoming bias” is about trying to subtract it out. Once again, we see that cognitive biases are lemons, not lemonade. To the extent we can get strong information—e.g. from cognitive psychology experiments—about the direction and magnitude of a systematic cognitive error, we can do systematically better by trying to compensate.
Useful Statistical Biases
Friday’s post on statistical bias and the bias-variance decomposition discussed how the squared error of an estimator equals the directional error of the estimator plus the variance of the estimator. All else being equal, bias is bad—you want to get rid of it. But all else is not always equal. Sometimes, by accepting a small amount of bias in your estimator, you can eliminate a large amount of variance. This is known as the “bias-variance tradeoff”.
A linear regression tries to estimate a quantity by attaching weights to various signals associated with that quantity—for example, you could try to predict the gas mileage of a car using the car’s mass and engine capacity.
A regularized linear regression tries to attach smaller variable weights, while still matching the data fairly well. A regularized regression may generalize to unseen data better than an unregularized regression—often quite a lot better. Assigning smaller variable weights is akin to finding a simpler explanation that fits the data almost as well. This drive for simplicity makes the regressor less sensitive to small random wobbles in the data, so it has lower variance: if you ran the regressor over different data samples, the estimates would look more similar to each other.
But the same regularization procedure also causes the estimator to ignore some actual data—and this is a systematic error, that would recur in the same direction if we repeated the experiment many times. The randomness goes in both directions, so by ignoring the noise in the data, you decrease your variance. But the real evidence goes in one direction, so if you ignore some real evidence in the process of ignoring noise—because you don’t know which is which—then you end up with a directional error, an error that trends in the same direction when you repeat the experiment many times.
In statistics this is known as the bias-variance tradeoff. When your data is limited, it may be better to use a simplifying estimator that doesn’t try to fit every tiny squiggle of the data, and this trades off a lot of variance against a little bias.
An “unbiased estimator” is one whose expected result equals the correct result, although it may have wide random swings in either direction. This is good if you are allowed to repeat the experiment as often as you like, because you can average together the estimates and get the correct answer to arbitrarily fine precision. That’s the law of large numbers.
You might have the following bright idea—why not use an unbiased estimator, like an unregularized regression, to guess the bias of a regularized regression? Then you could just subtract out the systematic bias—you could have low bias and low variance. The problem with this, you see, is that while it may be easy to find an unbiased estimator of the bias, this estimate may have very large variance—so if you subtract out an estimate of the systematic bias, you may end up subtracting out way too much, or even subtracting in the wrong direction a fair fraction of the time. In statistics, “unbiased” is not the same as “good”, unless the estimator also has low variance.
When you hear that a classroom gave an average estimate of 871 beans for a jar that contained 850 beans, and that only one individual student did better than the crowd, the astounding notion is not that the crowd can be more accurate than the individual. The astounding notion is that human beings are unbiased estimators of beans in a jar, having no significant directional error on the problem, yet with large variance. It implies that we tend to get the answer wrong but there’s no systematic reason why. It requires that there be lots of errors that vary from individual to individual—and this is reliably true, enough so to keep most individuals from guessing the jar correctly. And yet there are no directional errors that everyone makes, or if there are, they cancel out very precisely in the average case, despite the large individual variations. Which is just plain odd. I find myself somewhat suspicious of the claim, and wonder whether other experiments that found less amazing accuracy were not as popularly reported.
Someone is bound to suggest that cognitive biases are useful, in the sense that they represent a bias-variance tradeoff. I think this is just mixing up words—just because the word “bias” is used by two different fields doesn’t mean it has the same technical definition. When we accept a statistical bias in trade, we can’t get strong information about the direction and magnitude of the bias—otherwise we would just subtract it out. We may be able to get an unbiased estimate of the bias, but “unbiased” is not the same as “reliable”; if the variance is huge, we really have very little information. Now with cognitive biases, we do have some idea of the direction of the systematic error, and the whole notion of “overcoming bias” is about trying to subtract it out. Once again, we see that cognitive biases are lemons, not lemonade. To the extent we can get strong information—e.g. from cognitive psychology experiments—about the direction and magnitude of a systematic cognitive error, we can do systematically better by trying to compensate.