A Comment on Expected Utility Theory
A Comment on Expected Utility Theory
Expected utility theory/expected value decision making—as the case may be—is quite interesting I guess. In times, past (for a few months at longest as I am a neophyte to rationality) I habitually trusted the answers expected utility theory provided without bothering to test them, or ponder for myself why they would be advisable. I mean, I first learned the concept of expected value in statistics and when we studied it in Operations Research—as part of an introduction to decision theory—it just seemed to make sense. However, after the recent experiment I did (the decision problem between a guaranteed \$250,000 and a 10\% chance to get \$10,000,000) I began to start doubting the appropriateness of Expected Utility Theory. Over 85\% of subjects chose the first option, despite the latter having an expected value 4 times higher than the former. I myself realised that the only scenario in which I would choose the \$10,000,000 was one in which \$250,000 was an amount I could pass up. Now I am fully cognisant of expected utility theory, and my decision to pick the first option did not seem to be prey to any bias, so a suspicion on the efficacy of expected utility theory began to develop in my mind. I took my experiment to http://www.reddit.com/r/lesswrong; a community who I expected would be more rational decision makers—they only confirmed the decision making of the first group. I realised then, that something was wrong; my map didn’t reflect the territory. If expected utility theory was truly so sound, then a community of rationalists should have adhered to its dictates. I filed this information at the back of my mind. My brain began working on it, and today while I was reading “Thinking, Fast and Slow” by Daniel Kahneman my brain delivered an answer to me.
I do not consider myself a slave to rationality; it is naught but a tool for me to achieve my goals. A tool to help me “win”, and to do so consistently. If any ritual of cognition causes me to lose, then I abandon it. There is no sentimentality on the road to victory, and above all I endeavour to be efficient—ruthlessly so if needed. As such, I am willing to abandon any theory of decision making, when I determine it would cause me to lose. Nevertheless, as a rationalist I had to wonder; if expected utility theory was so feeble a stratagem, why had it stuck around for so long? I decided to explore the theory from its roots; to derive it for myself so to speak; to figure out where the discrepancy had come from.
Expected Utility Theory, aims to maximise the Expected Utility of a decision which is naught but the average utility of that decision—the average payoff.
Average payoff is given by the formula:
\[E_{j} = Pr_i*G_{ij} \tag{1}\]
Where
\(E_j\) = Expected value of Decision \(j\)
\(P_j\) = Probability of Scenario \(i\)
\(G_{ij}\) = Payoff of Decision \(j\) under Scenario \(i\).
What caught my interest when I decided to investigate expected utility theory from its roots, was the use of probability in the formula.
Now the definition of probability is:
\[Pr(i) = \lim_{n \to \infty} \frac{\sum i}{n} \tag{2}\]
Where \(\sum i\) is to be understood to be \(f i\) the frequency of \(i\).
If I keep in mind the definition of probability, I find something interesting; Expected Utility Theory maximises my payoff in the long run. For decision problems, which are iterated—in which I play the game several times—then Expected Utility Theory is my best bet. The closer the number of iterations are to infinity, the closer the probability is to the ratio above.
Substituting \((2)\) into \((1)\) we get:
\[E_j = \frac{\sum i}{n} * G_{ij} \tag{(3)}\]
What Expected Utility theory tells us is to choose the highest \(E_j\); this is only guaranteed to be the optimum decision in a scenario where \((1)\) = \((3)\) I.e.
The decision problem has a (sufficiently) large number of iterations.
The decision problem involves a (sufficiently) large number of scenarios.
After much deliberation, I reached a conclusion; in any non—iterated game in which a single scenario has an overwhelming high probability \(Pr = 1 - \epsilon\), then the maximum likelihood approach is the rational decision-making approach. Personally, I believe \(\epsilon\) should be \(\ge 0.005\) and set mine at around \(0.1\).
I may in future revisit this writeup, and add a mathematical argument for the application of the Maximum likelihood approach over the Expected Utility approach but for now, I shall posit a simpler argument:
The Expected Utility approach is sensible only in that it maximises winnings in the long run—by its very design, it is intended for games that are iterated and/or in which there is a large number of scenarios. In games where this is not true—with few scenarios and a single instance—there is sufficient variation in the event that occurs that there is a significant deviation of the actual payoff from the expected payoff. To ignore this deviation is oversimplification, and—I’ll argue—irrational. In the experiment I listed above, the actual payoff for the second decision was \$0 or \$10,000,000; the former scenario having a likelihood of 90\% and the latter a 10\%. The expected value is \$1,000,000 but the standard deviation of the payoffs from the expected value—in this case \$3,000,000—is 300\% the mean. In such cases, I conclude that the expected utility approach is simply unreliable—and expectably so—it was never designed for such problems in the first place (pun intended).
Aside from the issue that dollars ought to have diminishing returns in utility of some form, expected utility is perfectly capable of modeling situations where ergodicity/ensemble assumptions are violated and a greedy one-step policy of making the decision with maximum immediate expected utility doesn’t work. You just need to actually include the full dynamics of the situation in your model and plan appropriately.
I realize that this may be shocking, but in order to get a correct answer, you need to ask the correct question. ‘Pray, Mr Babbage, if you put the wrong numbers in, will the right numbers come out?’ You may want to study the uses of expected utility or decision theory a little bit more before trying to refute and replace it...
For example, if you have a limited amount of money and are vulnerable to gambler’s ruin, expected utility will still give you the correct answer if you set up a environment model and do planning or backward induction to calculate the optimal policy (which converges on Kelly criterion with longer horizons, and Kelly converges on greedy 1-step EU-maximization with larger capital). I’ve analyzed one game where greedy 1-step EU-maximization almost always leads to zero gains but a decision tree—using nothing but expected utility maximization and a correct model of the game—leads to maximal gains >94% of the time: https://www.gwern.net/Coin-flip Nothing new here, backwards induction and expected utility go back at least to von Neumann/Morgenstern.
The article was admittedly premature.
I strongly recommend that no one attempt to use the phrase “expected utility” without understanding, at a reasonable level of detail, the proof of the von Neumann-Morgenstern theorem. For my take on the proof see this blog post. Among other things, understanding the proof teaches you the following important lessons:
Utilities can be assigned completely arbitrarily. All the vNM theorem tells you is that a collection of preferences satisfying some axioms (“being vNM rational”) is equivalent to a collection of preferences described by maximizing expected utility with respect to some utility function, but it puts no constraints whatsoever on the utility function.
The vNM theorem also does not imply that you ought to make decisions by maximizing expected utility, only that if you are vNM rational then your preferences can be described in this way. (Also, humans aren’t vNM rational and it’s not at all clear that we should try to be, just so we’re all clear.)
The vNM theorem makes no mention of time or of making multiple decisions; the justification for maximizing expected utility, in this setup, has absolutely nothing to do with long-run averages of repeated decisions, it is in some sense a mathematical trick for expressing certain kinds of preferences and that’s it. In the proof of the vNM theorem utility falls out as “that thing which we must be maximizing the expected value of, if we’re vNM rational.”
The standard way to interpret the relevance of the vNM theorem for an agent acting in the world over time is that your preferences should actually be over world-histories, not world-states, hence if you’re vNM rational then your utility function takes as input a world-history and you’re maximizing expected utility with respect to probability distributions over world-histories (possibly once, ever: say, when you make a decision at the beginning of time about what you’re going to do in all possible futures). Needless to say nobody has ever or will ever do this.
Anyone who’s actually interested in formal theories of how to make decisions over time should be learning about reinforcement learning, which is a much richer framework than the vNM theorem and about which there’s much more to say.
Expected utility is not the same thing as expected dollars. As AgentStonecutter explained to you on Reddit last month, the standard assumption of diminishing marginal utility of money is entirely sufficient to account for preferring the guaranteed $250,000; no need to patch standard decision theory. (The von Neumann–Morgenstern theorem doesn’t depend on decisions being repeated; if you want to escape your decisions being describable as the maximization of some utility function, you have to reject one of the axioms, even if your decision is the only decision in the universe.)
Not really. See this post. Still, the answer to that could be, and probably is, that people’s behavior in these matters is in fact unreasonable, and they should change it. I agree that the basic problem in the post here is that expected utility is equated with expected dollars. In utility theory a 10% chance of 10 times the utility is completely equal to 100% chance of the base utility. This is “by definition”, and if you prefer the 100% chance, you are saying that the other choice has less than 10 times the value.