EDIT: My original post was wrong. I will leave it quoted at the end for the purposes of preserving information, but it is now replaced with a new post that correctly expresses my sentiments. The original title of this post was “expected utility maximization is not rational”.
As many people are probably aware, there is a theorem, called the Von Neumann-Morgenstern utility theorem, which states that anyone expressing consistent preferences must be maximizing the expected value of some function. The definition of consistent preferences is as follows:
Let A, B, and C be probability distributions over outcomes. Let A < B denote that B is preferred to A, and A = B denote that someone is indifferent between A and B. Then we assume
Either A < B, A > B, or A = B. In other words, you have to express a preference. This is reasonable because in the real world, you always have to make a decision (even “lack of action” is a decision).
If A < B, and B < C, then A < C. I believe that this is also clearly reasonable. If you have three possible actions, leading to distributions over outcomes A, B, and C, then you have to choose one of the three, meaning one of them is always preferred. So you can’t have cycles of preferences.
If A < B, then (1-x)A+xC < B for some x in (0,1) that is allowed to depend on A, B, and C. In other words, if B is preferred to A then B is also preferred to sufficiently small changes to A.
If A < B then pA+(1-p)C < pB+(1-p)C for all p in (0,1). This is the least intuitive of the four axioms to me, and the one that I initially disagreed with. But I believe that you can argue in favor of it as follows: I flip a coin with weight p, and draw from X if p is heads and C if p is tails. I let you choose whether you want x to be A or B. It seems clear that if you prefer B to A, then you should choose B in this situation. However, I have not thought about this long enough to be completely sure that this is the case. Most other people seem to also think this is a reasonable axiom, so I’m going to stick with it for now.
Given these axioms, we can show that there exists a real-valued function u over outcomes such that A < B if and only if EA[u] < EB[u], where EX is the expected value with respect to the distribution X.
Now, the important thing to note here is that this is an existence proof only. The function u doesn’t have to look at all reasonable, it merely assigns a value to every possible outcome (in particular, even if E1 and E2 seem like completely unrelated events, there is no reason as far as I can tell why u([E1 and E2]) has to have anything to do with u(E1)+u(E2), for instance. Among other things, u is only defined up to an additive constant and so not only is there no reason to be true, it will be completely false for almost all possible utility functions, *even if you keep the person whose utility you are considering fixed*.
In particular, it seems ridiculous that we would worry about an outcome that only occurs with probability 10-100. What this actually means is that our utility function is always much smaller than 10100, or rather that the ratio of the difference in utility between trivially small changes in outcome and arbitrarily large changes in outcome is always much larger than 10-100. This is how to avoid issues like Pascal’s mugging, even in the least convenient possible world (since utility is an abstract construction, no universe can “make” a utility function become unbounded).
What this means in particular is that saying that someone must maximize expected utility to be rational is not very productive. In particular, unless the other person has a sufficiently good technical grasp of what this means, they will probably do the wrong thing. Also, unless *you* have a good technical grasp of what it means, something that appears to violated expected utility might not. Remember, because utility is an artificial construct that has no reason to look reasonable, someone with completely reasonable preferences could have a very weird-*looking* utility function. Instead of telling people to maximize expected utility, we should identify which of the four above axioms they are violating, then explain why they are being irrational (or, if the purpose is to educate in advance, explain to them why the four axioms above should be respected). [Note however that just because a perfectly rational person *always* satisfies the above axioms, doesn’t mean that you will be better off if you satisfy the above axioms more often. Your preferences might have a complicated cycle that you are unsure how to correctly resolve. Picking a resolution at random is unlikely to be a good idea.]
Now, utility is this weird function that we don’t understand at all. Then why does it seem like there’s something called utility that **both** fits our intuitions and that people should be maximizing? The answer is that in many cases utility *can* be equated with something like money + risk aversion. The reason why is due to the law of large numbers, formalized through various bounds such as Hoeffding’s inequality and the Chernoff bound, as well as more powerful arguments likeconcentration of measure. What these arguments say is that if you have a large number of random variables that are sufficiently uncorrelated and that have sufficiently small standard deviation relative to the mean, then with high probability their sum is very close to their expected sum. So when our variables all have means that are reasonable close to each other (as is the case for most every day events), we can say something like the total *monetary* value of our combined actions will be very close to the sum of the expected monetary values of our individual actions (and likewise for other quantities like time). So in situations where, e.g., your goal is to spend as little time on undesirable work as possible, you want to minimize expected time spent on undesirable work, **as a heuristic that holds in most practical cases**. While this might make it *look* like your utility function is time in this case, I believe that the resemblance is purely coincidental, and you certainly shouldn’t be willing to make very low-success-rate gambles with large time payoffs.
Old post:
I’m posting this to the discussion because I don’t plan to make a detailed argument, mainly because I think this point should be extremely clear, even though many people on LessWrong seem to disagree with me.
Maximizing expected utility is not a terminal goal, it is a useful heuristic. To see why always maximizing expected utility is clearly bad, consider an action A with a 10-10 chance of giving you 10100 units of utility, and a 1-10-10 chance of losing you 1010 units of utility. Then expected utility maximization requires you to perform A, even though it is obviously a bad idea. I believe this has been discussed here previously as Pascal’s mugging.
For some reason, this didn’t lead everyone to the obvious conclusion that maximizing expected utility is the wrong thing to do, so I’m going to try to dissolve the issue by looking at why we would want to maximize expected utility in most situations. I think once this is accomplished it will be obvious why there is no particular reason to maximize expected utility for very low-probability events (in fact, one might consider having a utility function over probability distributions rather than actual states of the world).
The reason that you normally want to maximize expected utility is because of the law of large numbers, formalized through various bounds such as Hoeffding’s inequality and the Chernoff bound, as well as more powerful arguments like concentration of measure. What these arguments say is that if you have a large number of random variables that are sufficiently uncorrelated and that have sufficiently small variance relative to the mean, then with high probability their sum is very close to their expected sum. Thus for events with probabilities that are bounded away from 0 and 1 you always expect your utility to be very close to your expected utility, and should therefore maximize expected utility in order to maximize actual utility. But once the probabilities get small (or the events correlated, e.g. you are about to make an irreversible decision), these bounds no longer hold and the reasons for maximizing expected utility vanish. You should instead consider what sort of distribution over outcomes you find desirable.
Utility is unintuitive
EDIT: My original post was wrong. I will leave it quoted at the end for the purposes of preserving information, but it is now replaced with a new post that correctly expresses my sentiments. The original title of this post was “expected utility maximization is not rational”.
As many people are probably aware, there is a theorem, called the Von Neumann-Morgenstern utility theorem, which states that anyone expressing consistent preferences must be maximizing the expected value of some function. The definition of consistent preferences is as follows:
Let A, B, and C be probability distributions over outcomes. Let A < B denote that B is preferred to A, and A = B denote that someone is indifferent between A and B. Then we assume
Either A < B, A > B, or A = B. In other words, you have to express a preference. This is reasonable because in the real world, you always have to make a decision (even “lack of action” is a decision).
If A < B, and B < C, then A < C. I believe that this is also clearly reasonable. If you have three possible actions, leading to distributions over outcomes A, B, and C, then you have to choose one of the three, meaning one of them is always preferred. So you can’t have cycles of preferences.
If A < B, then (1-x)A+xC < B for some x in (0,1) that is allowed to depend on A, B, and C. In other words, if B is preferred to A then B is also preferred to sufficiently small changes to A.
If A < B then pA+(1-p)C < pB+(1-p)C for all p in (0,1). This is the least intuitive of the four axioms to me, and the one that I initially disagreed with. But I believe that you can argue in favor of it as follows: I flip a coin with weight p, and draw from X if p is heads and C if p is tails. I let you choose whether you want x to be A or B. It seems clear that if you prefer B to A, then you should choose B in this situation. However, I have not thought about this long enough to be completely sure that this is the case. Most other people seem to also think this is a reasonable axiom, so I’m going to stick with it for now.
Given these axioms, we can show that there exists a real-valued function u over outcomes such that A < B if and only if EA[u] < EB[u], where EX is the expected value with respect to the distribution X.
Now, the important thing to note here is that this is an existence proof only. The function u doesn’t have to look at all reasonable, it merely assigns a value to every possible outcome (in particular, even if E1 and E2 seem like completely unrelated events, there is no reason as far as I can tell why u([E1 and E2]) has to have anything to do with u(E1)+u(E2), for instance. Among other things, u is only defined up to an additive constant and so not only is there no reason to be true, it will be completely false for almost all possible utility functions, *even if you keep the person whose utility you are considering fixed*.
In particular, it seems ridiculous that we would worry about an outcome that only occurs with probability 10-100. What this actually means is that our utility function is always much smaller than 10100, or rather that the ratio of the difference in utility between trivially small changes in outcome and arbitrarily large changes in outcome is always much larger than 10-100. This is how to avoid issues like Pascal’s mugging, even in the least convenient possible world (since utility is an abstract construction, no universe can “make” a utility function become unbounded).
What this means in particular is that saying that someone must maximize expected utility to be rational is not very productive. In particular, unless the other person has a sufficiently good technical grasp of what this means, they will probably do the wrong thing. Also, unless *you* have a good technical grasp of what it means, something that appears to violated expected utility might not. Remember, because utility is an artificial construct that has no reason to look reasonable, someone with completely reasonable preferences could have a very weird-*looking* utility function. Instead of telling people to maximize expected utility, we should identify which of the four above axioms they are violating, then explain why they are being irrational (or, if the purpose is to educate in advance, explain to them why the four axioms above should be respected). [Note however that just because a perfectly rational person *always* satisfies the above axioms, doesn’t mean that you will be better off if you satisfy the above axioms more often. Your preferences might have a complicated cycle that you are unsure how to correctly resolve. Picking a resolution at random is unlikely to be a good idea.]
Now, utility is this weird function that we don’t understand at all. Then why does it seem like there’s something called utility that **both** fits our intuitions and that people should be maximizing? The answer is that in many cases utility *can* be equated with something like money + risk aversion. The reason why is due to the law of large numbers, formalized through various bounds such as Hoeffding’s inequality and the Chernoff bound, as well as more powerful arguments likeconcentration of measure. What these arguments say is that if you have a large number of random variables that are sufficiently uncorrelated and that have sufficiently small standard deviation relative to the mean, then with high probability their sum is very close to their expected sum. So when our variables all have means that are reasonable close to each other (as is the case for most every day events), we can say something like the total *monetary* value of our combined actions will be very close to the sum of the expected monetary values of our individual actions (and likewise for other quantities like time). So in situations where, e.g., your goal is to spend as little time on undesirable work as possible, you want to minimize expected time spent on undesirable work, **as a heuristic that holds in most practical cases**. While this might make it *look* like your utility function is time in this case, I believe that the resemblance is purely coincidental, and you certainly shouldn’t be willing to make very low-success-rate gambles with large time payoffs.
Old post: