Caspar Oesterheld comments on Conditional Prediction with Zero-Sum Training Solves Self-Fulfilling Prophecies

Caspar Oesterheld 29 May 2023 17:22 UTC
LW: 6 AF: 4
0
AF
The following is based on an in-person discussion with Johannes Treutlein (the second author of the OP).
>But is there some concrete advantage of zero-sum conditional prediction over the above method?
So, here’s a very concrete and clear (though perhaps not very important) advantage of the proposed method over the method I proposed. The method I proposed only works if you want to maximize expected utility relative to the predictor’s beliefs. The zero-sum competition model enables optimal choice under a much broader set of possible preferences over outcome distributions.
Let’s say that you have some arbitrary (potentially wacky discontinuous) function V that maps a distributions over outcomes onto a real value representing how much you like the distribution over outcomes. Then you can do zero-sum competition as normal and select the action for which V is highest (as usual with “optimism bias”, i.e., if the two predictors make different predictions for an action a, then take the maximum of the Vs of the two actions). This should still be incentive compatible and result in taking the action that is best in terms of V applied to the predictors’ belief.
(Of course, one could have even crazier preferences. For example, one’s preferences could just be a function that takes as input a set of distributions and selects one distribution as its favorite. But I think if this preference function is intransitive, doesn’t satisfy independence of irrelevant alternatives and the like, it’s not so clear whether the proposed approach still works. For example, you might be able to slightly misreport some option that will not be taken anyway in such a way as to ensure that the decision maker ends up taking a different action. I don’t think this is ever strictly incentivized. But it’s not strictly disincentivized to do this.)
Interestingly, if V is a strictly convex function over outcome distributions (why would it be? I don’t know!), then you can strictly incentivize a single predictor to report the best action and honestly report the full distribution over outcomes for that action! Simply use the scoring rule $S (p_{a}, q_{a}) = V (p_{a}) + v (p_{a})^{T} (q_{a} - p_{a})$ , where $p_{a}$ is the reported distribution for the recommended action, $q_{a}$ is the true distribution of the recommended action and $v$ is a subderivative of $V$ . Because a proper scoring rule is used, the expert will be incentivized to report $p_{a} = q_{a}$ and thus gets a score of $V (q_{a})$ , where $q_{a}$ is the distribution of the recommended action. So it will recommend the action $a$ whose associate distribution maximizes $V$ . It’s easy to show that if $V$ -- the function saying how much you like different distribution—is not strictly convex, then you can’t construct such a scoring rule. If I recall correctly, these facts are also pointed out in one of the papers by Chen et al. on this topic.
I don’t find this very important, because I find expected utility maximization w.r.t. the predictors’ beliefs much more plausible than anything else. But if nothing else, this difference further shows that the proposed method is fundamentally different and more capable in some ways than other methods (like the one I proposed in my comment).