Sorry, I feel like the point I wanted to make with my original bullet point is somewhat vaguer/different than what you’re responding to. Let me try to clarify what I wanted to do with that argument with a caricatured version of the present argument-branch from my point of view:
your original question (caricatured): “The Sun prayer decision rule is as follows: you pray to the Sun; this makes a certain set of actions seem auspicious to you. Why not endorse the Sun prayer decision rule?”
my bullet point: “Bayesian expected utility maximization has this big red arrow pointing toward it, but the Sun prayer decision rule has no big red arrow pointing toward it.”
your response: “Maybe a few specific Sun prayer decision rules are also pointed to by that red arrow?”
my response: “The arrow does not point toward most Sun prayer decision rules. In fact, it only points toward the ones that are secretly bayesian expected utility maximization. Anyway, I feel like this does very little to address my original point that there is this big red arrow pointing toward bayesian expected utility maximization and no big red arrow pointing toward Sun prayer decision rules.”
(See the appendix to my previous comment for more on this.)
That said, I admit I haven’t said super clearly how the arrow ends up pointing to structuring your psychology in a particular way (as opposed to just pointing at a class of ways to behave). I think I won’t do a better job at this atm than what I said in the second paragraph of my previous comment.
The minimax regret rule (sec 5.4.2 of Bradley (2012)) is equivalent to EV max w.r.t. the distribution in your representor that induces maximum regret.
I’m (inside view) 99.9% sure this will be false/nonsense in a sequential setting. I’m (inside view) 99% sure this is false/nonsense even in the one-shot case. I guess the issue is that different actions get assigned their max regret by different distributions, so I’m not sure what you mean when you talk about the distribution that induces maximum regret. And indeed, it is easy to come up with a case where the action that gets chosen is not best according to any distribution in your set of distributions: let there be one action which is uniformly fine and also for each distribution in the set, let there be an action which is great according to that distribution and disastrous according to every other distribution; the uniformly fine action gets selected, but this isn’t EV max for any distribution in your representor. That said, if we conceive of the decision rule as picking out a single action to perform, then because the decision rule at least takes Pareto improvements, I think a convex optimization argument says that the single action it picks is indeed the maximal EV one according to some distribution (though not necessarily one in your set). However, if we conceive of the decision rule as giving preferences between actions or if we try to use it in some sequential setup, then I’m >95% sure there is no way to see it as EV max (except in some silly way, like forgetting you had preferences in the first place).
The maximin rule (sec 5.4.1) is equivalent to EV max w.r.t. the most pessimistic distribution.
I didn’t think about this as carefully, but >90% that the paragraph above also applies with minor changes.
You might say “Then why not just do precise EV max w.r.t. those distributions?” But the whole problem you face as a decision-maker is, how do you decide which distribution? Different distributions recommend different policies. If you endorse precise beliefs, it seems you’ll commit to one distribution that you think best represents your epistemic state. Whereas someone with imprecise beliefs will say: “My epistemic state is not represented by just one distribution. I’ll evaluate the imprecise decision rules based on which decision-theoretic desiderata they satisfy, then apply the most appealing decision rule (or some way of aggregating them) w.r.t. my imprecise beliefs.” If the decision procedure you follow is psychologically equivalent to my previous sentence, then I have no objection to your procedure — I just think it would be misleading to say you endorse precise beliefs in that case.
I think I agree in some very weak sense. For example, when I’m trying to diagnose a health issue,
I do want to think about which priors and likelihoods to use — it’s not like these things are immediately given to me or something. In this sense, I’m at some point contemplating many possible distributions to use. But I guess we do have some meaningful disagreement left — I guess I take the most appealing decision rule to be more like pure aggregation than you do; I take imprecise probabilities with maximality to be a major step toward madness from doing something that stays closer to expected utility maximization.
my response: “The arrow does not point toward most Sun prayer decision rules. In fact, it only points toward the ones that are secretly bayesian expected utility maximization. Anyway, I feel like this does very little to address my original point that there is this big red arrow pointing toward bayesian expected utility maximization and no big red arrow pointing toward Sun prayer decision rules.”
I don’t really understand your point, sorry. “Big red arrows towards X” only are a problem for doing Y if (1) they tell me that doing Y is inconsistent with doing [the form of X that’s necessary to avoid leaving value on the table]. And these arrows aren’t action-guiding for me unless (2) they tell me which particular variant of X to do. I’ve argued that there is no sense in which either (1) or (2) is true. Further, I think there are various big green arrows towards Y, as sketched in the SEP article and Mogensen paper I linked in the OP, though I understand if these aren’t fully satisfying positive arguments. (I tentatively plan to write such positive arguments up elsewhere.)
I’m just not swayed by vibes-level “arrows” if there isn’t an argument that my approach is leaving value on the table by my lights, or that you have a particular approach that doesn’t do so.
And indeed, it is easy to come up with a case where the action that gets chosen is not best according to any distribution in your set of distributions: let there be one action which is uniformly fine and also for each distribution in the set, let there be an action which is great according to that distribution and disastrous according to every other distribution; the uniformly fine action gets selected, but this isn’t EV max for any distribution in your representor.
Oops sorry, my claim had the implicit assumptions that (1) your representor includes all the convex combinations, and (2) you can use mixed strategies. ((2) is standard in decision theory, and I think (1) is a reasonable assumption — if I feel clueless as to how much I endorse distribution p vs distribution q, it seems weird for me to still be confident that I don’t endorse a mixture of the two.)
If those assumptions hold, I think you can show that the max-regret-minimizing action maximizes EV w.r.t. some distribution in your representor. I don’t have a proof on hand but would welcome counterexamples. In your example, you can check that either the uniformly fine action does best on a mixture distribution, or a mix of the other actions does best (lmk if spelling this out would be helpful).
Oh ok yea that’s a nice setup and I think I know how to prove that claim — the convex optimization argument I mentioned should give that. I still endorse the branch of my previous comment that comes after considering roughly that option though:
That said, if we conceive of the decision rule as picking out a single action to perform, then because the decision rule at least takes Pareto improvements, I think a convex optimization argument says that the single action it picks is indeed the maximal EV one according to some distribution (though not necessarily one in your set). However, if we conceive of the decision rule as giving preferences between actions or if we try to use it in some sequential setup, then I’m >95% sure there is no way to see it as EV max (except in some silly way, like forgetting you had preferences in the first place).
The branch that’s about sequential decision-making, you mean? I’m unconvinced by this too, see e.g. here — I’d appreciate more explicit arguments for this being “nonsense.”
To clarify, I think in this context I’ve only said that the claim “The minimax regret rule (sec 5.4.2 of Bradley (2012)) is equivalent to EV max w.r.t. the distribution in your representor that induces maximum regret” (and maybe the claim after it) was “false/nonsense” — in particular, because it doesn’t make sense to talk about a distribution that induces maximum regret (without reference to a particular action) — which I’m guessing you agree with.
I wanted to say that I endorse the following:
Neither of the two decision rules you mentioned is (in general) consistent with any EV max if we conceive of it as giving your preferences (not just picking out a best option), nor if we conceive of it as telling you what to do on each step of a sequential decision-making setup.
I think basically any setup is an example for either of these claims. Here’s a canonical counterexample for the version with preferences and the max_{actions} min_{probability distributions} EV (i.e., infrabayes) decision rule, i.e. with our preferences corresponding to the min_{probability distributions} EV ranking:
Let a and c be actions and let b be flipping a fair coin and then doing a or c depending on the outcome. It is easy to construct a case where the max-min rule strictly prefers b to a and also strictly prefers b to c, and indeed where this preference is strong enough that the rule still strictly prefers b to a small enough sweetening of a and also still prefers b to a small enough sweetening of c (in fact, a generic setup will have such a triple). Call these sweetenings a+ and c+ (think of these as a-but-you-also-get-one-cent or a-but-you-also-get-one-extra-moment-of-happiness or whatever; the important thing is that all utility functions under consideration should consider this one cent or one extra moment of happiness or whatever a positive). However, every EV max rule (that cares about the one cent) will strictly disprefer b to at least one of a+ or c+, because if that weren’t the case, the EV max rule would need to weakly prefer b over a coinflip between a+ and c+, but this is just saying that the EV max rule weakly prefers b to b+, which contradicts with it caring about sweetening. So these min preferences are incompatible with maximizing any EV.
There is a canonical way in which a counterexample in preference-land can be turned into a counterexample in sequential-decision-making-land: just make the “sequential” setup really just be a two-step game where you first randomly pick a pair of actions to give the agent a choice between, and then the agent makes some choice. The game forces the max min agent to “reveal its preferences” sufficiently for its policy to be revealed to be inconsistent with EV maxing. (This is easiest to see if the agent is forced to just make a binary choice. But it’s still true even if you avoid the strictly binary choice being forced upon the agent by saying that the agent still has access to (internal) randomization.)
Regarding the Thornley paper you link: I’ve said some stuff about it in my earlier comments; my best guess for what to do next would be to prove some theorem about behavior that doesn’t make explicit use of a completeness assumption, but also it seems likely that this would fail to relate sufficiently to our central disagreements to be worthwhile. I guess I’m generally feeling like I might bow out of this written conversation soon/now, sorry! But I’d be happy to talk more about this synchronously — if you’d like to schedule a meeting, feel free to message me on the LW messenger.
Sorry, I feel like the point I wanted to make with my original bullet point is somewhat vaguer/different than what you’re responding to. Let me try to clarify what I wanted to do with that argument with a caricatured version of the present argument-branch from my point of view:
your original question (caricatured): “The Sun prayer decision rule is as follows: you pray to the Sun; this makes a certain set of actions seem auspicious to you. Why not endorse the Sun prayer decision rule?”
my bullet point: “Bayesian expected utility maximization has this big red arrow pointing toward it, but the Sun prayer decision rule has no big red arrow pointing toward it.”
your response: “Maybe a few specific Sun prayer decision rules are also pointed to by that red arrow?”
my response: “The arrow does not point toward most Sun prayer decision rules. In fact, it only points toward the ones that are secretly bayesian expected utility maximization. Anyway, I feel like this does very little to address my original point that there is this big red arrow pointing toward bayesian expected utility maximization and no big red arrow pointing toward Sun prayer decision rules.”
(See the appendix to my previous comment for more on this.)
That said, I admit I haven’t said super clearly how the arrow ends up pointing to structuring your psychology in a particular way (as opposed to just pointing at a class of ways to behave). I think I won’t do a better job at this atm than what I said in the second paragraph of my previous comment.
I’m (inside view) 99.9% sure this will be false/nonsense in a sequential setting. I’m (inside view) 99% sure this is false/nonsense even in the one-shot case. I guess the issue is that different actions get assigned their max regret by different distributions, so I’m not sure what you mean when you talk about the distribution that induces maximum regret. And indeed, it is easy to come up with a case where the action that gets chosen is not best according to any distribution in your set of distributions: let there be one action which is uniformly fine and also for each distribution in the set, let there be an action which is great according to that distribution and disastrous according to every other distribution; the uniformly fine action gets selected, but this isn’t EV max for any distribution in your representor. That said, if we conceive of the decision rule as picking out a single action to perform, then because the decision rule at least takes Pareto improvements, I think a convex optimization argument says that the single action it picks is indeed the maximal EV one according to some distribution (though not necessarily one in your set). However, if we conceive of the decision rule as giving preferences between actions or if we try to use it in some sequential setup, then I’m >95% sure there is no way to see it as EV max (except in some silly way, like forgetting you had preferences in the first place).
I didn’t think about this as carefully, but >90% that the paragraph above also applies with minor changes.
I think I agree in some very weak sense. For example, when I’m trying to diagnose a health issue, I do want to think about which priors and likelihoods to use — it’s not like these things are immediately given to me or something. In this sense, I’m at some point contemplating many possible distributions to use. But I guess we do have some meaningful disagreement left — I guess I take the most appealing decision rule to be more like pure aggregation than you do; I take imprecise probabilities with maximality to be a major step toward madness from doing something that stays closer to expected utility maximization.
I don’t really understand your point, sorry. “Big red arrows towards X” only are a problem for doing Y if (1) they tell me that doing Y is inconsistent with doing [the form of X that’s necessary to avoid leaving value on the table]. And these arrows aren’t action-guiding for me unless (2) they tell me which particular variant of X to do. I’ve argued that there is no sense in which either (1) or (2) is true. Further, I think there are various big green arrows towards Y, as sketched in the SEP article and Mogensen paper I linked in the OP, though I understand if these aren’t fully satisfying positive arguments. (I tentatively plan to write such positive arguments up elsewhere.)
I’m just not swayed by vibes-level “arrows” if there isn’t an argument that my approach is leaving value on the table by my lights, or that you have a particular approach that doesn’t do so.
Oops sorry, my claim had the implicit assumptions that (1) your representor includes all the convex combinations, and (2) you can use mixed strategies. ((2) is standard in decision theory, and I think (1) is a reasonable assumption — if I feel clueless as to how much I endorse distribution p vs distribution q, it seems weird for me to still be confident that I don’t endorse a mixture of the two.)
If those assumptions hold, I think you can show that the max-regret-minimizing action maximizes EV w.r.t. some distribution in your representor. I don’t have a proof on hand but would welcome counterexamples. In your example, you can check that either the uniformly fine action does best on a mixture distribution, or a mix of the other actions does best (lmk if spelling this out would be helpful).
Oh ok yea that’s a nice setup and I think I know how to prove that claim — the convex optimization argument I mentioned should give that. I still endorse the branch of my previous comment that comes after considering roughly that option though:
The branch that’s about sequential decision-making, you mean? I’m unconvinced by this too, see e.g. here — I’d appreciate more explicit arguments for this being “nonsense.”
To clarify, I think in this context I’ve only said that the claim “The minimax regret rule (sec 5.4.2 of Bradley (2012)) is equivalent to EV max w.r.t. the distribution in your representor that induces maximum regret” (and maybe the claim after it) was “false/nonsense” — in particular, because it doesn’t make sense to talk about a distribution that induces maximum regret (without reference to a particular action) — which I’m guessing you agree with.
I wanted to say that I endorse the following:
Neither of the two decision rules you mentioned is (in general) consistent with any EV max if we conceive of it as giving your preferences (not just picking out a best option), nor if we conceive of it as telling you what to do on each step of a sequential decision-making setup.
I think basically any setup is an example for either of these claims. Here’s a canonical counterexample for the version with preferences and the max_{actions} min_{probability distributions} EV (i.e., infrabayes) decision rule, i.e. with our preferences corresponding to the min_{probability distributions} EV ranking:
Let a and c be actions and let b be flipping a fair coin and then doing a or c depending on the outcome. It is easy to construct a case where the max-min rule strictly prefers b to a and also strictly prefers b to c, and indeed where this preference is strong enough that the rule still strictly prefers b to a small enough sweetening of a and also still prefers b to a small enough sweetening of c (in fact, a generic setup will have such a triple). Call these sweetenings a+ and c+ (think of these as a-but-you-also-get-one-cent or a-but-you-also-get-one-extra-moment-of-happiness or whatever; the important thing is that all utility functions under consideration should consider this one cent or one extra moment of happiness or whatever a positive). However, every EV max rule (that cares about the one cent) will strictly disprefer b to at least one of a+ or c+, because if that weren’t the case, the EV max rule would need to weakly prefer b over a coinflip between a+ and c+, but this is just saying that the EV max rule weakly prefers b to b+, which contradicts with it caring about sweetening. So these min preferences are incompatible with maximizing any EV.
There is a canonical way in which a counterexample in preference-land can be turned into a counterexample in sequential-decision-making-land: just make the “sequential” setup really just be a two-step game where you first randomly pick a pair of actions to give the agent a choice between, and then the agent makes some choice. The game forces the max min agent to “reveal its preferences” sufficiently for its policy to be revealed to be inconsistent with EV maxing. (This is easiest to see if the agent is forced to just make a binary choice. But it’s still true even if you avoid the strictly binary choice being forced upon the agent by saying that the agent still has access to (internal) randomization.)
Regarding the Thornley paper you link: I’ve said some stuff about it in my earlier comments; my best guess for what to do next would be to prove some theorem about behavior that doesn’t make explicit use of a completeness assumption, but also it seems likely that this would fail to relate sufficiently to our central disagreements to be worthwhile. I guess I’m generally feeling like I might bow out of this written conversation soon/now, sorry! But I’d be happy to talk more about this synchronously — if you’d like to schedule a meeting, feel free to message me on the LW messenger.