The technical definition of bias, the one you’re using, is that given a true value, the expected value of the estimate is equal to the true value. The one that I’d use is that given an estimate, the expected value of the true value is equal to the estimate. The latter is what you should be minimizing.
The technical definition is E[estimate—true value] where the true value is typically taken as a number and not a variable we have uncertainty about, but there’s nothing in this definition preventing the true value from being a random variable.
Yes, the technical definition is E[estimate—parameter], but “unbiased” has an implicit “for all parameter values”. You really can’t stick a random variable there and have the same meaning that statisticians use.
(That said, I don’t see how DanielLC’s reformulation makes sense.)
It won’t have the same meaning, but nothing in the math prevents you from doing it and it might be more informative since it allows you to look at a single bias number instead of an uncountable set of biases (and Bayesian decision theory essentially does this). To be a little more explicit, the technical definition of bias is:
E[estimator|true value] - true value
And if we want to minimize bias, we try to do so over all possible values of the true values. But we can easily integrate over the space of the true value (assuming some prior over the true value) to achieve
E[ E[estimator|true value] - true value ] = E[ estimator—true value ]
This is similar to the Bayes risk of the estimator with respect to some prior distribution (the difference is that we don’t have a loss function here). By analogy, I might call this “Bayes bias.”
The only issue is that your estimator may be right on average but that doesn’t mean it’s going to be anywhere close to the true value. Usually bias is used along with the variance of the estimator (since MSE(estimator)=Variance(estimator) + [Bias(estimator)]^2 ), but we could just modify our definition of Bayes bias so that we only have to look at one number to take the absolute value of the difference—the numbers closer to zero mean better estimators. Then we’re just calculating Bayes risk with respect to some prior and absolute error loss, i.e.
E[ | estimator—true value | ]
(Which is NOT in general equivalent to | E[estimator—true value] | by Jensen’s inequality)
The technical definition is E[estimate—true value] where the true value is typically taken as a number and not a variable we have uncertainty about, but there’s nothing in this definition preventing the true value from being a random variable.
Yes, the technical definition is E[estimate—parameter], but “unbiased” has an implicit “for all parameter values”. You really can’t stick a random variable there and have the same meaning that statisticians use. (That said, I don’t see how DanielLC’s reformulation makes sense.)
It won’t have the same meaning, but nothing in the math prevents you from doing it and it might be more informative since it allows you to look at a single bias number instead of an uncountable set of biases (and Bayesian decision theory essentially does this). To be a little more explicit, the technical definition of bias is:
E[estimator|true value] - true value
And if we want to minimize bias, we try to do so over all possible values of the true values. But we can easily integrate over the space of the true value (assuming some prior over the true value) to achieve
E[ E[estimator|true value] - true value ] = E[ estimator—true value ]
This is similar to the Bayes risk of the estimator with respect to some prior distribution (the difference is that we don’t have a loss function here). By analogy, I might call this “Bayes bias.”
The only issue is that your estimator may be right on average but that doesn’t mean it’s going to be anywhere close to the true value. Usually bias is used along with the variance of the estimator (since MSE(estimator)=Variance(estimator) + [Bias(estimator)]^2 ), but we could just modify our definition of Bayes bias so that we only have to look at one number to take the absolute value of the difference—the numbers closer to zero mean better estimators. Then we’re just calculating Bayes risk with respect to some prior and absolute error loss, i.e.
E[ | estimator—true value | ]
(Which is NOT in general equivalent to | E[estimator—true value] | by Jensen’s inequality)