Manfred comments on The Optimizer’s Curse and How to Beat It

Manfred 18 Sep 2011 8:48 UTC
1 point
0
Imagine putting gold coins into a bunch of boxes by having them normally distributed about 50 gold coins with standard deviation 10. Then we’ll add some Gaussian noise to the estimates on the boxes—but we’ll split them into 2 groups. Ten boxes will have noise with standard deviation of 5, while the other ten will have a standard deviation of 25.

But since I’ve still kept the simple situation where we just have 2 groups, you can get the overall biggest by just picking the biggest from each group and comparing them. So we can treat the groups independently for a bit. The biggest one is going to have the biggest positive deviation from 50, combined signal and noise. Because I used normal distributions this time, the combined prior+noise distribution is just a bigger normal distribution. So given that something is big or small by this combined distribution, how do we expect the signal and noise distributions to shift? Well, it would be silly to expect one of them to be more improbable than the other, so we expect their means to shift by about the same number of standard deviations for each distribution. This right there means that the bigger the noise, the more of the variation we should attribute to noise. And also the bigger the element in the combined distribution, the larger we should expect its noise to be.
What links here?
- Manfred's comment on The Optimizer’s Curse and How to Beat It by lukeprog (28 Sep 2011 4:41 UTC; 2 points)
- Oscar_Cunningham 18 Sep 2011 9:45 UTC
  0 points
  0
  Parent
  But if you know the boxes were originally drawn from N(50,100) then the number on the box is no longer the correct Bayesian mean. All I’m arguing is that once you have your Bayesian expected value you don’t need to update it any further.
  - Manfred 18 Sep 2011 10:13 UTC
    3 points
    0
    Parent
    
    All I’m arguing is that once you have your Bayesian expected value you don’t need to update it any further.
    
    That’s pretty uncontroversial, but in practice it means that you end up penalizing high-noise boxes with high values (and boosting high-noise boxes with low values), which I think is a nontrivial result.