Well in some circumstances, this kind of reasoning would actually change the decision you make. For example, you might have one option with a high estimate and very high confidence, and another option with an even higher estimate, but lower confidence. After applying the approach described in the article, those two options might end up switching position in the rankings.
BUT: Most of the time, I don’t think this approach will make you choose a different option. If all other factors are equal, then you’ll probably still pick the option that has the highest expected value. I think that what we learn from this article is more about something else: It’s about understanding that the final result will probably be lower than your supposedly “unbiased” estimate. And when you understand that, you can budget accordingly.
The big problem arises when the number of choices is huge and sparsely explored, such as when optimizing a neural network.
But restricting ourselves to n superficially evaluated choices with known estimate variance in each evaluation and with independent errors/noise, then if – as in realistic cases like Monte Carlo Tree Search – we are allowed to perform some additional “measurements” to narrow down the uncertainty, it will be wise to scrutinize the high-expectance choices most – in a way trying to “falsify” their greatness, while increasing the certainty of their greatness if the falsification “fails”. This is the effect of using heuristics like the Upper Confidence Bound for experiment/branch selection.
UCB is also described as “optimism in the face of uncertainty”, which kind of defeats the point I am making if it is deployed as decision policy. What I mean is that in research, preparations and planning (with tree search in perfect information games as a formal example where UCB can be applied), one should put a lot of effort into finding out whether the seemingly best choice (of path, policy, etc.) really is that good, and then make a final choice that penalizes remaining uncertainty.
The math for order statistics is quite neat as long as the variables are independently sampled from the same distribution. In real life, “sadly”, choice evaluations may not always be from the same distribution… Rather, they are by definition conditional upon the choices. (https://en.wikipedia.org/wiki/Bapat%E2%80%93Beg_theorem provides a kind of solution in the form of an intractable colossus of a calculation.) That is not to say that there can be found no valuable/informative approximations.
Well in some circumstances, this kind of reasoning would actually change the decision you make. For example, you might have one option with a high estimate and very high confidence, and another option with an even higher estimate, but lower confidence. After applying the approach described in the article, those two options might end up switching position in the rankings.
BUT: Most of the time, I don’t think this approach will make you choose a different option. If all other factors are equal, then you’ll probably still pick the option that has the highest expected value. I think that what we learn from this article is more about something else: It’s about understanding that the final result will probably be lower than your supposedly “unbiased” estimate. And when you understand that, you can budget accordingly.
The big problem arises when the number of choices is huge and sparsely explored, such as when optimizing a neural network.
But restricting ourselves to n superficially evaluated choices with known estimate variance in each evaluation and with independent errors/noise, then if – as in realistic cases like Monte Carlo Tree Search – we are allowed to perform some additional “measurements” to narrow down the uncertainty, it will be wise to scrutinize the high-expectance choices most – in a way trying to “falsify” their greatness, while increasing the certainty of their greatness if the falsification “fails”. This is the effect of using heuristics like the Upper Confidence Bound for experiment/branch selection.
UCB is also described as “optimism in the face of uncertainty”, which kind of defeats the point I am making if it is deployed as decision policy. What I mean is that in research, preparations and planning (with tree search in perfect information games as a formal example where UCB can be applied), one should put a lot of effort into finding out whether the seemingly best choice (of path, policy, etc.) really is that good, and then make a final choice that penalizes remaining uncertainty.
I would like to throw in a Wikipedia article on a relevant topic, which I came across while reading about the related “Winner’s curse”: https://en.wikipedia.org/wiki/Order_statistic
The math for order statistics is quite neat as long as the variables are independently sampled from the same distribution. In real life, “sadly”, choice evaluations may not always be from the same distribution… Rather, they are by definition conditional upon the choices. (https://en.wikipedia.org/wiki/Bapat%E2%80%93Beg_theorem provides a kind of solution in the form of an intractable colossus of a calculation.) That is not to say that there can be found no valuable/informative approximations.