DanielLC comments on The Optimizer’s Curse and How to Beat It

DanielLC 16 Sep 2011 4:16 UTC
−5 points
Am I the only one that thinks that this is a silly definition of bias?

The technical definition of bias, the one you’re using, is that given a true value, the expected value of the estimate is equal to the true value. The one that I’d use is that given an estimate, the expected value of the true value is equal to the estimate. The latter is what you should be minimizing.

You should be using Bayesian methods to find these expected values, and they generally are biased, at leased in the technical sense. You shouldn’t come up with an unbiased estimator and correct for it using Bayesian methods. You should use a biased estimator in the first place.
- Matt_Simpson 16 Sep 2011 5:51 UTC
  2 points
  Parent
  
  The technical definition of bias, the one you’re using, is that given a true value, the expected value of the estimate is equal to the true value. The one that I’d use is that given an estimate, the expected value of the true value is equal to the estimate. The latter is what you should be minimizing.
  
  The technical definition is E[estimate—true value] where the true value is typically taken as a number and not a variable we have uncertainty about, but there’s nothing in this definition preventing the true value from being a random variable.
  - wnoise 16 Sep 2011 8:21 UTC
    0 points
    Parent
    Yes, the technical definition is E[estimate—parameter], but “unbiased” has an implicit “for all parameter values”. You really can’t stick a random variable there and have the same meaning that statisticians use. (That said, I don’t see how DanielLC’s reformulation makes sense.)
    - Matt_Simpson 16 Sep 2011 16:01 UTC
      0 points
      Parent
      It won’t have the same meaning, but nothing in the math prevents you from doing it and it might be more informative since it allows you to look at a single bias number instead of an uncountable set of biases (and Bayesian decision theory essentially does this). To be a little more explicit, the technical definition of bias is:
      
      E[estimator|true value] - true value
      
      And if we want to minimize bias, we try to do so over all possible values of the true values. But we can easily integrate over the space of the true value (assuming some prior over the true value) to achieve
      
      E[ E[estimator|true value] - true value ] = E[ estimator—true value ]
      
      This is similar to the Bayes risk of the estimator with respect to some prior distribution (the difference is that we don’t have a loss function here). By analogy, I might call this “Bayes bias.”
      
      The only issue is that your estimator may be right on average but that doesn’t mean it’s going to be anywhere close to the true value. Usually bias is used along with the variance of the estimator (since MSE(estimator)=Variance(estimator) + [Bias(estimator)]^2 ), but we could just modify our definition of Bayes bias so that we only have to look at one number to take the absolute value of the difference—the numbers closer to zero mean better estimators. Then we’re just calculating Bayes risk with respect to some prior and absolute error loss, i.e.
      
      E[ | estimator—true value | ]
      
      (Which is NOT in general equivalent to | E[estimator—true value] | by Jensen’s inequality)