Oh ok yea that’s a nice setup and I think I know how to prove that claim — the convex optimization argument I mentioned should give that. I still endorse the branch of my previous comment that comes after considering roughly that option though:
That said, if we conceive of the decision rule as picking out a single action to perform, then because the decision rule at least takes Pareto improvements, I think a convex optimization argument says that the single action it picks is indeed the maximal EV one according to some distribution (though not necessarily one in your set). However, if we conceive of the decision rule as giving preferences between actions or if we try to use it in some sequential setup, then I’m >95% sure there is no way to see it as EV max (except in some silly way, like forgetting you had preferences in the first place).
The branch that’s about sequential decision-making, you mean? I’m unconvinced by this too, see e.g. here — I’d appreciate more explicit arguments for this being “nonsense.”
To clarify, I think in this context I’ve only said that the claim “The minimax regret rule (sec 5.4.2 of Bradley (2012)) is equivalent to EV max w.r.t. the distribution in your representor that induces maximum regret” (and maybe the claim after it) was “false/nonsense” — in particular, because it doesn’t make sense to talk about a distribution that induces maximum regret (without reference to a particular action) — which I’m guessing you agree with.
I wanted to say that I endorse the following:
Neither of the two decision rules you mentioned is (in general) consistent with any EV max if we conceive of it as giving your preferences (not just picking out a best option), nor if we conceive of it as telling you what to do on each step of a sequential decision-making setup.
I think basically any setup is an example for either of these claims. Here’s a canonical counterexample for the version with preferences and the max_{actions} min_{probability distributions} EV (i.e., infrabayes) decision rule, i.e. with our preferences corresponding to the min_{probability distributions} EV ranking:
Let a and c be actions and let b be flipping a fair coin and then doing a or c depending on the outcome. It is easy to construct a case where the max-min rule strictly prefers b to a and also strictly prefers b to c, and indeed where this preference is strong enough that the rule still strictly prefers b to a small enough sweetening of a and also still prefers b to a small enough sweetening of c (in fact, a generic setup will have such a triple). Call these sweetenings a+ and c+ (think of these as a-but-you-also-get-one-cent or a-but-you-also-get-one-extra-moment-of-happiness or whatever; the important thing is that all utility functions under consideration should consider this one cent or one extra moment of happiness or whatever a positive). However, every EV max rule (that cares about the one cent) will strictly disprefer b to at least one of a+ or c+, because if that weren’t the case, the EV max rule would need to weakly prefer b over a coinflip between a+ and c+, but this is just saying that the EV max rule weakly prefers b to b+, which contradicts with it caring about sweetening. So these min preferences are incompatible with maximizing any EV.
There is a canonical way in which a counterexample in preference-land can be turned into a counterexample in sequential-decision-making-land: just make the “sequential” setup really just be a two-step game where you first randomly pick a pair of actions to give the agent a choice between, and then the agent makes some choice. The game forces the max min agent to “reveal its preferences” sufficiently for its policy to be revealed to be inconsistent with EV maxing. (This is easiest to see if the agent is forced to just make a binary choice. But it’s still true even if you avoid the strictly binary choice being forced upon the agent by saying that the agent still has access to (internal) randomization.)
Regarding the Thornley paper you link: I’ve said some stuff about it in my earlier comments; my best guess for what to do next would be to prove some theorem about behavior that doesn’t make explicit use of a completeness assumption, but also it seems likely that this would fail to relate sufficiently to our central disagreements to be worthwhile. I guess I’m generally feeling like I might bow out of this written conversation soon/now, sorry! But I’d be happy to talk more about this synchronously — if you’d like to schedule a meeting, feel free to message me on the LW messenger.
Oh ok yea that’s a nice setup and I think I know how to prove that claim — the convex optimization argument I mentioned should give that. I still endorse the branch of my previous comment that comes after considering roughly that option though:
The branch that’s about sequential decision-making, you mean? I’m unconvinced by this too, see e.g. here — I’d appreciate more explicit arguments for this being “nonsense.”
To clarify, I think in this context I’ve only said that the claim “The minimax regret rule (sec 5.4.2 of Bradley (2012)) is equivalent to EV max w.r.t. the distribution in your representor that induces maximum regret” (and maybe the claim after it) was “false/nonsense” — in particular, because it doesn’t make sense to talk about a distribution that induces maximum regret (without reference to a particular action) — which I’m guessing you agree with.
I wanted to say that I endorse the following:
Neither of the two decision rules you mentioned is (in general) consistent with any EV max if we conceive of it as giving your preferences (not just picking out a best option), nor if we conceive of it as telling you what to do on each step of a sequential decision-making setup.
I think basically any setup is an example for either of these claims. Here’s a canonical counterexample for the version with preferences and the max_{actions} min_{probability distributions} EV (i.e., infrabayes) decision rule, i.e. with our preferences corresponding to the min_{probability distributions} EV ranking:
Let a and c be actions and let b be flipping a fair coin and then doing a or c depending on the outcome. It is easy to construct a case where the max-min rule strictly prefers b to a and also strictly prefers b to c, and indeed where this preference is strong enough that the rule still strictly prefers b to a small enough sweetening of a and also still prefers b to a small enough sweetening of c (in fact, a generic setup will have such a triple). Call these sweetenings a+ and c+ (think of these as a-but-you-also-get-one-cent or a-but-you-also-get-one-extra-moment-of-happiness or whatever; the important thing is that all utility functions under consideration should consider this one cent or one extra moment of happiness or whatever a positive). However, every EV max rule (that cares about the one cent) will strictly disprefer b to at least one of a+ or c+, because if that weren’t the case, the EV max rule would need to weakly prefer b over a coinflip between a+ and c+, but this is just saying that the EV max rule weakly prefers b to b+, which contradicts with it caring about sweetening. So these min preferences are incompatible with maximizing any EV.
There is a canonical way in which a counterexample in preference-land can be turned into a counterexample in sequential-decision-making-land: just make the “sequential” setup really just be a two-step game where you first randomly pick a pair of actions to give the agent a choice between, and then the agent makes some choice. The game forces the max min agent to “reveal its preferences” sufficiently for its policy to be revealed to be inconsistent with EV maxing. (This is easiest to see if the agent is forced to just make a binary choice. But it’s still true even if you avoid the strictly binary choice being forced upon the agent by saying that the agent still has access to (internal) randomization.)
Regarding the Thornley paper you link: I’ve said some stuff about it in my earlier comments; my best guess for what to do next would be to prove some theorem about behavior that doesn’t make explicit use of a completeness assumption, but also it seems likely that this would fail to relate sufficiently to our central disagreements to be worthwhile. I guess I’m generally feeling like I might bow out of this written conversation soon/now, sorry! But I’d be happy to talk more about this synchronously — if you’d like to schedule a meeting, feel free to message me on the LW messenger.