If you use the Anti-Nirvana trick, your agent just goes “nothing matters at all, the foe will mispredict and I’ll get -infinity reward” and rolls over and cries since all policies are optimal. Don’t do that one, it’s a bad idea.
For the concave expectation functionals: Well, there’s another constraint or two, like monotonicity, but yeah, LF duality basically says that you can turn any (monotone) concave expectation functional into an inframeasure. Ie, all risk aversion can be interpreted as having radical uncertainty over some aspects of how the environment works and assuming you get worst-case outcomes from the parts you can’t predict.
For your concrete example, that’s why you have multiple hypotheses that are learnable. Sure, one of your hypotheses might have complete knightian uncertainty over the odd bits, but another hypothesis might not. Betting on the odd bits is advised by a more-informative hypothesis, for sufficiently good bets. And the policy selected by the agent would probably be something like “bet on the odd bits occasionally, and if I keep losing those bets, stop betting”, as this wins in the hypothesis where some of the odd bits are predictable, and doesn’t lose too much in the hypothesis where the odd bits are completely unpredictable and out to make you lose.
If you use the Anti-Nirvana trick, your agent just goes “nothing matters at all, the foe will mispredict and I’ll get -infinity reward” and rolls over and cries since all policies are optimal. Don’t do that one, it’s a bad idea.
Sorry, I meant the combination of best-case reasoning (sup instead of inf) and the anti-Nirvana trick. In that case the agent goes “Murphy won’t mispredict, since then I’d get -infinity reward which can’t be the best that I do”.
For your concrete example, that’s why you have multiple hypotheses that are learnable.
Hmm, that makes sense, I think? Perhaps I just haven’t really internalized the learning aspect of all of this.
If you use the Anti-Nirvana trick, your agent just goes “nothing matters at all, the foe will mispredict and I’ll get -infinity reward” and rolls over and cries since all policies are optimal. Don’t do that one, it’s a bad idea.
For the concave expectation functionals: Well, there’s another constraint or two, like monotonicity, but yeah, LF duality basically says that you can turn any (monotone) concave expectation functional into an inframeasure. Ie, all risk aversion can be interpreted as having radical uncertainty over some aspects of how the environment works and assuming you get worst-case outcomes from the parts you can’t predict.
For your concrete example, that’s why you have multiple hypotheses that are learnable. Sure, one of your hypotheses might have complete knightian uncertainty over the odd bits, but another hypothesis might not. Betting on the odd bits is advised by a more-informative hypothesis, for sufficiently good bets. And the policy selected by the agent would probably be something like “bet on the odd bits occasionally, and if I keep losing those bets, stop betting”, as this wins in the hypothesis where some of the odd bits are predictable, and doesn’t lose too much in the hypothesis where the odd bits are completely unpredictable and out to make you lose.
Sorry, I meant the combination of best-case reasoning (sup instead of inf) and the anti-Nirvana trick. In that case the agent goes “Murphy won’t mispredict, since then I’d get -infinity reward which can’t be the best that I do”.
Hmm, that makes sense, I think? Perhaps I just haven’t really internalized the learning aspect of all of this.