But all you’ve done after “adjusting” the expected value estimates was producing a new batch of expected value estimates, which just shows that the original expected value estimates were not done very carefully (if there was an improvement), or that you face the same problem all over again...
I’m thinking of this as “updating on whether I actually occupy the epistemic state that I think I occupy”, which one hopes would be less of a problem for a superintelligence than for a human.
I expect it to be a problem—probably as serious—for superintelligence. The universe will always be bigger and more complex than any model of it, and I’m pretty sure a mind can’t fully model itself.
Superintelligences will presumably have epistemic problems we can’t understand, and probably better tools for working on them, but unless I’m missing something, there’s no way to make the problem go away.
Yeah, but at least it shouldn’t have all the subconscious signaling problems that compromise conscious reasoning in humans- at least I hope nobody would be dumb enough to build a superintelligence that deceives itself on account of social adaptations that don’t update when the context changes...
I must admit that I did not understand everything in the paper, but I think this excerpt summarizes a crucial point:
“The key issue here is proper conditioning. The unbiasedness of the value estimates V_i discussed in §1 is unbiasedness conditional on mu. In contrast, we might think of the revised estimates ^v_i as being unbiased conditional on V. At the time we optimize and make the decision, we know V but we do not know mu, so proper conditioning dictates that we work with distributions and estimates conditional on V.”
The proposed “solution” converts n independent evaluations into n evaluations (estimates) that respect the selection process, but, as far as I can tell, they still rest on prior value estimates and prior knowledge about the uncertainty of those estimates… Which means the “solution” at best limits introduction of optimizer bias, and at worst… masks old mistakes?
Well in some circumstances, this kind of reasoning would actually change the decision you make. For example, you might have one option with a high estimate and very high confidence, and another option with an even higher estimate, but lower confidence. After applying the approach described in the article, those two options might end up switching position in the rankings.
BUT: Most of the time, I don’t think this approach will make you choose a different option. If all other factors are equal, then you’ll probably still pick the option that has the highest expected value. I think that what we learn from this article is more about something else: It’s about understanding that the final result will probably be lower than your supposedly “unbiased” estimate. And when you understand that, you can budget accordingly.
The big problem arises when the number of choices is huge and sparsely explored, such as when optimizing a neural network.
But restricting ourselves to n superficially evaluated choices with known estimate variance in each evaluation and with independent errors/noise, then if – as in realistic cases like Monte Carlo Tree Search – we are allowed to perform some additional “measurements” to narrow down the uncertainty, it will be wise to scrutinize the high-expectance choices most – in a way trying to “falsify” their greatness, while increasing the certainty of their greatness if the falsification “fails”. This is the effect of using heuristics like the Upper Confidence Bound for experiment/branch selection.
UCB is also described as “optimism in the face of uncertainty”, which kind of defeats the point I am making if it is deployed as decision policy. What I mean is that in research, preparations and planning (with tree search in perfect information games as a formal example where UCB can be applied), one should put a lot of effort into finding out whether the seemingly best choice (of path, policy, etc.) really is that good, and then make a final choice that penalizes remaining uncertainty.
The math for order statistics is quite neat as long as the variables are independently sampled from the same distribution. In real life, “sadly”, choice evaluations may not always be from the same distribution… Rather, they are by definition conditional upon the choices. (https://en.wikipedia.org/wiki/Bapat%E2%80%93Beg_theorem provides a kind of solution in the form of an intractable colossus of a calculation.) That is not to say that there can be found no valuable/informative approximations.
But all you’ve done after “adjusting” the expected value estimates was producing a new batch of expected value estimates, which just shows that the original expected value estimates were not done very carefully (if there was an improvement), or that you face the same problem all over again...
Am I missing something?
I’m thinking of this as “updating on whether I actually occupy the epistemic state that I think I occupy”, which one hopes would be less of a problem for a superintelligence than for a human.
It reminds me of Yvain’s Confidence Levels Inside and Outside an Argument.
I expect it to be a problem—probably as serious—for superintelligence. The universe will always be bigger and more complex than any model of it, and I’m pretty sure a mind can’t fully model itself.
Superintelligences will presumably have epistemic problems we can’t understand, and probably better tools for working on them, but unless I’m missing something, there’s no way to make the problem go away.
Yeah, but at least it shouldn’t have all the subconscious signaling problems that compromise conscious reasoning in humans- at least I hope nobody would be dumb enough to build a superintelligence that deceives itself on account of social adaptations that don’t update when the context changes...
I must admit that I did not understand everything in the paper, but I think this excerpt summarizes a crucial point:
“The key issue here is proper conditioning. The unbiasedness of the value estimates V_i discussed in §1 is unbiasedness conditional on mu. In contrast, we might think of the revised estimates ^v_i as being unbiased conditional on V. At the time we optimize and make the decision, we know V but we do not know mu, so proper conditioning dictates that we work with distributions and estimates conditional on V.”
The proposed “solution” converts n independent evaluations into n evaluations (estimates) that respect the selection process, but, as far as I can tell, they still rest on prior value estimates and prior knowledge about the uncertainty of those estimates… Which means the “solution” at best limits introduction of optimizer bias, and at worst… masks old mistakes?
Well in some circumstances, this kind of reasoning would actually change the decision you make. For example, you might have one option with a high estimate and very high confidence, and another option with an even higher estimate, but lower confidence. After applying the approach described in the article, those two options might end up switching position in the rankings.
BUT: Most of the time, I don’t think this approach will make you choose a different option. If all other factors are equal, then you’ll probably still pick the option that has the highest expected value. I think that what we learn from this article is more about something else: It’s about understanding that the final result will probably be lower than your supposedly “unbiased” estimate. And when you understand that, you can budget accordingly.
The big problem arises when the number of choices is huge and sparsely explored, such as when optimizing a neural network.
But restricting ourselves to n superficially evaluated choices with known estimate variance in each evaluation and with independent errors/noise, then if – as in realistic cases like Monte Carlo Tree Search – we are allowed to perform some additional “measurements” to narrow down the uncertainty, it will be wise to scrutinize the high-expectance choices most – in a way trying to “falsify” their greatness, while increasing the certainty of their greatness if the falsification “fails”. This is the effect of using heuristics like the Upper Confidence Bound for experiment/branch selection.
UCB is also described as “optimism in the face of uncertainty”, which kind of defeats the point I am making if it is deployed as decision policy. What I mean is that in research, preparations and planning (with tree search in perfect information games as a formal example where UCB can be applied), one should put a lot of effort into finding out whether the seemingly best choice (of path, policy, etc.) really is that good, and then make a final choice that penalizes remaining uncertainty.
I would like to throw in a Wikipedia article on a relevant topic, which I came across while reading about the related “Winner’s curse”: https://en.wikipedia.org/wiki/Order_statistic
The math for order statistics is quite neat as long as the variables are independently sampled from the same distribution. In real life, “sadly”, choice evaluations may not always be from the same distribution… Rather, they are by definition conditional upon the choices. (https://en.wikipedia.org/wiki/Bapat%E2%80%93Beg_theorem provides a kind of solution in the form of an intractable colossus of a calculation.) That is not to say that there can be found no valuable/informative approximations.