Useful Statistical Biases
Friday’s post on statistical bias and the bias-variance decomposition discussed how the squared error of an estimator equals the directional error of the estimator plus the variance of the estimator. All else being equal, bias is bad—you want to get rid of it. But all else is not always equal. Sometimes, by accepting a small amount of bias in your estimator, you can eliminate a large amount of variance. This is known as the “bias-variance tradeoff”.
A linear regression tries to estimate a quantity by attaching weights to various signals associated with that quantity—for example, you could try to predict the gas mileage of a car using the car’s mass and engine capacity.
A regularized linear regression tries to attach smaller variable weights, while still matching the data fairly well. A regularized regression may generalize to unseen data better than an unregularized regression—often quite a lot better. Assigning smaller variable weights is akin to finding a simpler explanation that fits the data almost as well. This drive for simplicity makes the regressor less sensitive to small random wobbles in the data, so it has lower variance: if you ran the regressor over different data samples, the estimates would look more similar to each other.
But the same regularization procedure also causes the estimator to ignore some actual data—and this is a systematic error, that would recur in the same direction if we repeated the experiment many times. The randomness goes in both directions, so by ignoring the noise in the data, you decrease your variance. But the real evidence goes in one direction, so if you ignore some real evidence in the process of ignoring noise—because you don’t know which is which—then you end up with a directional error, an error that trends in the same direction when you repeat the experiment many times.
In statistics this is known as the bias-variance tradeoff. When your data is limited, it may be better to use a simplifying estimator that doesn’t try to fit every tiny squiggle of the data, and this trades off a lot of variance against a little bias.
An “unbiased estimator” is one whose expected result equals the correct result, although it may have wide random swings in either direction. This is good if you are allowed to repeat the experiment as often as you like, because you can average together the estimates and get the correct answer to arbitrarily fine precision. That’s the law of large numbers.
You might have the following bright idea—why not use an unbiased estimator, like an unregularized regression, to guess the bias of a regularized regression? Then you could just subtract out the systematic bias—you could have low bias and low variance. The problem with this, you see, is that while it may be easy to find an unbiased estimator of the bias, this estimate may have very large variance—so if you subtract out an estimate of the systematic bias, you may end up subtracting out way too much, or even subtracting in the wrong direction a fair fraction of the time. In statistics, “unbiased” is not the same as “good”, unless the estimator also has low variance.
When you hear that a classroom gave an average estimate of 871 beans for a jar that contained 850 beans, and that only one individual student did better than the crowd, the astounding notion is not that the crowd can be more accurate than the individual. The astounding notion is that human beings are unbiased estimators of beans in a jar, having no significant directional error on the problem, yet with large variance. It implies that we tend to get the answer wrong but there’s no systematic reason why. It requires that there be lots of errors that vary from individual to individual—and this is reliably true, enough so to keep most individuals from guessing the jar correctly. And yet there are no directional errors that everyone makes, or if there are, they cancel out very precisely in the average case, despite the large individual variations. Which is just plain odd. I find myself somewhat suspicious of the claim, and wonder whether other experiments that found less amazing accuracy were not as popularly reported.
Someone is bound to suggest that cognitive biases are useful, in the sense that they represent a bias-variance tradeoff. I think this is just mixing up words—just because the word “bias” is used by two different fields doesn’t mean it has the same technical definition. When we accept a statistical bias in trade, we can’t get strong information about the direction and magnitude of the bias—otherwise we would just subtract it out. We may be able to get an unbiased estimate of the bias, but “unbiased” is not the same as “reliable”; if the variance is huge, we really have very little information. Now with cognitive biases, we do have some idea of the direction of the systematic error, and the whole notion of “overcoming bias” is about trying to subtract it out. Once again, we see that cognitive biases are lemons, not lemonade. To the extent we can get strong information—e.g. from cognitive psychology experiments—about the direction and magnitude of a systematic cognitive error, we can do systematically better by trying to compensate.
- Simultaneous Overconfidence and Underconfidence by 3 Jun 2015 21:04 UTC; 37 points) (
- Wisdom of the Crowd: not always so wise by 1 Jul 2012 20:55 UTC; 33 points) (
- [SEQ RERUN] Useful Statistical Biases by 21 May 2011 14:14 UTC; 7 points) (
- [Link] Bayesian Overconfidence by 24 Feb 2012 23:54 UTC; 6 points) (
- What Are Your Preferences Regarding The FLI Letter? by 1 Apr 2023 4:52 UTC; -4 points) (
The astounding notion is that human beings are unbiased estimators of beans in a jar, having no significant directional error on the problem, yet with large variance. It implies that we tend to get the answer wrong but there’s no systematic reason why. It requires that there be lots of errors that vary from individual to individual—and this is reliably true, enough so to keep most individuals from guessing the jar correctly. And yet there are no directional errors that everyone makes, or if there are, they cancel out very precisely in the average case, despite the large individual variations. Which is just plain odd. I find myself somewhat suspicious of the claim, and wonder whether other experiments that found less amazing accuracy were not as popularly reported.
This is precisely what I find disquieting about wisdom-of-crowds arguments—they require that our errors are nondirectional and normally distributed, but we know they aren’t. We have cognitive biases!
I have a somewhat related question—and openly admit to being a neophyte
my question is this traditional variance weights positive and negative outcomes equally
how can one compute a variance that reflects a persons bias (risk aversion) toward a directional outcome as in business assume an ill favored outcome is worth 0.5x and a preferred outcome is worth 1.5x
would a person compute 2 variances by creating 2 sub populations illfavored/preferred and apply the formula var (bx) = b^2times sigma^2 to each population and sum the final products?
am I wrong in this line of thinking? is there another approach? its been quite some time since my university stats days—so please be gentle with my ignorance
appreciate your thoughts and if you ping my email to let me know miroslodki (at) yahoo (dot) ca
btw—fascinating site and discussion regarding crowd wisdom—fwiw I share your viewpoints/concerns you’ve found a new reader cheers Miro
Say I were convinced that there is valuable wisdom of the crowds when guessing about the number of jelly beans in a jar.
I would be especially surprised by the average of humans’ guesses being accurate because I would have thought more relevant the humans’ average estimation error by percent of difference between the jar and the guess, whether it was an overestimation or an underestimation.
A guess of 425 beans is off by 425 beans and 50%, a guess of 1275 is off by 425 beans and 33.33%, and a guess of 1700 is off by 850 beans and 50%.
For a young child glancing at a map, which is a worse guess for how many states are in the U.S., 1 or 100? 10 or 100?
I think 1 and 10 are worse.
This can be solved by taking the error on a log scale rather than a linear scale.