You said you should drop X if you know that your estimate is high variance but that the actual values don’t vary much. Knowing that the actual value doesn’t vary much means your prior has low variance, while knowing that your estimate is noisy means that your prior for the error term has high variance.
So when you observe an estimate, you should attribute most of the variance to error, and regress your estimate substantially towards your prior mean. After doing that regression, you are better off including X than dropping it, as far as I can see. (Of course, if the regressed estimate is sufficiently small then it wasn’t even worth computing the estimate, but that’s a normal issue with allocating bounded computational resources and doesn’t depend on the variance of your estimate of X, just how large you expect the real value to be.)
But that’s not mysterious, that’s just regression to the mean.
I don’t understand—in what way is it regression to the mean?
Also, what does that have to do with my original comment, which is that you will do better by dropping high-variance terms?
You said you should drop X if you know that your estimate is high variance but that the actual values don’t vary much. Knowing that the actual value doesn’t vary much means your prior has low variance, while knowing that your estimate is noisy means that your prior for the error term has high variance.
So when you observe an estimate, you should attribute most of the variance to error, and regress your estimate substantially towards your prior mean. After doing that regression, you are better off including X than dropping it, as far as I can see. (Of course, if the regressed estimate is sufficiently small then it wasn’t even worth computing the estimate, but that’s a normal issue with allocating bounded computational resources and doesn’t depend on the variance of your estimate of X, just how large you expect the real value to be.)