I find it confusing that the only thing that matters to a rational agent is the expectation of utility, i.e., that the details of the probability distribution of utilities do not matter.
I understand that VNM theorem proves that from what seem reasonable axioms, but on the other hand it seems to me that there is nothing irrational about having different risk preferences. Consider the following two scenarios
A: you gain utility 1 with probability 1
B: you gain utility 0 with probability 1⁄2 or utility 2 with probability 1⁄2
According to expected utility, it is irrational to be anything but indifferent to between A and B. This seems wrong to me. I can even go a bit further, consider a third option:
C: you gain utility 0.9 with probability 1
Expected utility says it is irrational to prefer C to B, but this seems perfectly reasonable to me. It’s optimizing for the worst-case instead of the average case. Is there a direct way of showing that preferring B to C is irrational?
This is part of the meaning of ‘utility’. In real life we often have risk-averse strategies where, for example, 100% chance at 100 dollars is preferred to 50% chance of losing 100 dollars and 50% chance of gaining 350 dollars. But, under the assumption that our risk-averse tendencies satisfy the coherence properties from the post, this simply means that our utility is not linear in dollars. As far as I know this captures most of the situations where risk-aversion comes into play: often you simply cannot tolerate extremely negative outliers, meaning that your expected utility is mostly dominated by some large negative terms, and the best possible action is to minimize the probability that these outcomes occur.
Also there is the following: consider the case where you are repeatedly offered bets of the example you give (B versus C). You know this in advance, and are allowed to redesign your decision theory from scratch (but you cannot change the definition of ‘utility’ or the bets being offered). What criteria would you use to determine if B is preferable to C? The law of large numbers(/central limit theorem) states that in the long run with probability 1 the option with higher expected value will give you more utilons, and in fact that this number is the only number you need to figure out which option is the better pick in the long run.
The tricky bit is the question whether this also applies to one-shot problems or not. Maybe there are rational strategies that use, say, the aggregate median instead of the expected value, which has the same limit behaviour. My intuition is that this clashes with what we mean with ‘probability’ - even if this particular problem is a one-off, at least our strategy should generalise to all situations where we talk about probability 1⁄2, and then the law of large numbers applies again. I also suspect that any agent that uses more information to make this decision than the expected value to decide (in particular, occasionally deliberately chooses the option with lower expected utility) can be cheated out of utilons with clever adversarial selections of offers, but this is just a guess.
The tricky bit is the question whether this also applies to one-shot problems or not.
This is the crux. It seems to me that the expected utility frame work means that if you prefer A to B in one time choice, then you must also prefer n repetitions of A to n repetitions of B, because the fact that you have larger variance for n=1 does not matter. This seems intuitively wrong to me.
I’d hold that it’s the reverse that seems more questionable. If n is a large number then the Law of Large Numbers may be applicable (“the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.”).
Robyn Dawes makes a more detailed version of precisely this argument in Rational Choice in an Uncertain World. I summarize his argument in an old comment of mine. (The axiom you must reject, incidentally, if you find this sort of reasoning convincing, is the independence axiom.)
Thanks, I looked at the discussion you linked with interest. I think I understand my confusion a little better, but I am still confused.
I can walk through the proof of the VNM theorem and see where the independence axiom comes in and how it leads to u(A)=u(B) in my example. The axiom of independence itself feels unassailable to me and I am not quite sure this is a strong enough argument against it. Maybe having a more direct argument from axiom of independence to unintuitive result would be more convincing.
Maybe the answer is to read Dawes book, thanks for the reference.
The axiom of independence itself feels unassailable to me
Well, the axiom of independence is just that: an axiom. It doesn’t need to be assailed; we can take it as axiomatic, or not. If we do take it as axiomatic, certain interesting analyses become possible (depending on what other axioms we adopt). If we refuse to do so, then bad things happen—or so it’s claimed.
In any case, Dawes’ argument (and related ones) about the independence axiom fundamentally concerns the question of what properties of an outcome distribution we should concern ourselves with. (Here “outcome distribution” can refer to a probability distribution, or to some set of outcomes, distributed across time, space, individuals, etc., that is generated by some policy, which we may perhaps view as the output of a generator with some probability distribution.)
A VNM-compliant agent behaves as if it is maximizing the expectation of the utility of its outcome distribution. It is not concerned at all with other properties of that distribution, such as dispersion (i.e., standard deviation or some related measure) or skewness. (Or, to put it another way, a VNM-compliant agent is unconcerned with the form of the outcome distribution.)
What Dawes is saying is simply that, contra the assumptions of VNM-rationality, there seems to be ample reason to concern ourselves with, for instance, the skewness of the outcome distribution, and not just its expectation. But if we do prefer one outcome distribution to another, where the dis-preferred distribution has a higher expectation (but a “better” skewness), then we violate the independence axiom.
I think you are not allowed to refer explicitly to utility in the options. That is an option of “I do not choose this option” is selfdefeating and illformed. In another post I posited a risk-averse utility function that references amount of paperclips. Maximising the utility function doesn’t maximise expected amount of paperclips. Even if the physical objects of interest are paperclips and we value them linearly a paperclip is not synonymous with utilon. It’s not a thing you can give out in an option.
I think you are not allowed to refer explicitly to utility in the options.
I was going to answer that I can easily reword my example to not explicitly mention any utility values, but when I tried to that it very quickly led to something where it is obvious that u(A) = u(C). I guess my rewording was basically going through the steps of the proof of VNM theorem.
I am still not sure I am convinced by your objection, as I don’t think there’s anything self-referential in my example, but that did give me some pause.
In a case where you are going to pick less variance less expected value over more variance more expected value it will mean that option needs to have a bigger “utility number”. In order to get that you need to mess with how utility is calculated. Then it becomes ambigious whether the “utility-fruits” are redefined in the same go as we redefine how we compare options. If we name them “paperclips” it’s clear that they are not touched by such redefining.
It triggerred a “type-unsafety” trigger but the operation overall might be safe as it doesn’t actualise the danger. For example having an option of “plum + 2 utility” could give one agent “plum + apple” if it valued apples and “plum + pear” if it valued pears. I guess if you consistenly replace all physical items for their utility values it doesn’t happen.
In the case of “gain 1 utility with probability 1” if your agent is risk-seeking it might give this option “actual” utility less than 1. In general if we lose the distribution independence we might need to retain the information of our suboutcomes rather than collapsing it to he a single number. For if an agent is risk-seeking it’s clear that it would prefer A=( 5% 0,90% 1, 5% 2) to B=(100%, 1). But same risk-seeking in combined lotteries would make it prefer C=(5% , 90% A, 5% A+A) over A. When comparing C and A it’s not sufficent to know that their expected utilities are 1.
I find it confusing that the only thing that matters to a rational agent is the expectation of utility, i.e., that the details of the probability distribution of utilities do not matter.
I understand that VNM theorem proves that from what seem reasonable axioms, but on the other hand it seems to me that there is nothing irrational about having different risk preferences. Consider the following two scenarios
A: you gain utility 1 with probability 1
B: you gain utility 0 with probability 1⁄2 or utility 2 with probability 1⁄2
According to expected utility, it is irrational to be anything but indifferent to between A and B. This seems wrong to me. I can even go a bit further, consider a third option:
C: you gain utility 0.9 with probability 1
Expected utility says it is irrational to prefer C to B, but this seems perfectly reasonable to me. It’s optimizing for the worst-case instead of the average case. Is there a direct way of showing that preferring B to C is irrational?
This is part of the meaning of ‘utility’. In real life we often have risk-averse strategies where, for example, 100% chance at 100 dollars is preferred to 50% chance of losing 100 dollars and 50% chance of gaining 350 dollars. But, under the assumption that our risk-averse tendencies satisfy the coherence properties from the post, this simply means that our utility is not linear in dollars. As far as I know this captures most of the situations where risk-aversion comes into play: often you simply cannot tolerate extremely negative outliers, meaning that your expected utility is mostly dominated by some large negative terms, and the best possible action is to minimize the probability that these outcomes occur.
Also there is the following: consider the case where you are repeatedly offered bets of the example you give (B versus C). You know this in advance, and are allowed to redesign your decision theory from scratch (but you cannot change the definition of ‘utility’ or the bets being offered). What criteria would you use to determine if B is preferable to C? The law of large numbers(/central limit theorem) states that in the long run with probability 1 the option with higher expected value will give you more utilons, and in fact that this number is the only number you need to figure out which option is the better pick in the long run.
The tricky bit is the question whether this also applies to one-shot problems or not. Maybe there are rational strategies that use, say, the aggregate median instead of the expected value, which has the same limit behaviour. My intuition is that this clashes with what we mean with ‘probability’ - even if this particular problem is a one-off, at least our strategy should generalise to all situations where we talk about probability 1⁄2, and then the law of large numbers applies again. I also suspect that any agent that uses more information to make this decision than the expected value to decide (in particular, occasionally deliberately chooses the option with lower expected utility) can be cheated out of utilons with clever adversarial selections of offers, but this is just a guess.
This is the crux. It seems to me that the expected utility frame work means that if you prefer A to B in one time choice, then you must also prefer n repetitions of A to n repetitions of B, because the fact that you have larger variance for n=1 does not matter. This seems intuitively wrong to me.
I’d hold that it’s the reverse that seems more questionable. If n is a large number then the Law of Large Numbers may be applicable (“the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.”).
You may be interested in reading this series of posts.
Robyn Dawes makes a more detailed version of precisely this argument in Rational Choice in an Uncertain World. I summarize his argument in an old comment of mine. (The axiom you must reject, incidentally, if you find this sort of reasoning convincing, is the independence axiom.)
Thanks, I looked at the discussion you linked with interest. I think I understand my confusion a little better, but I am still confused.
I can walk through the proof of the VNM theorem and see where the independence axiom comes in and how it leads to u(A)=u(B) in my example. The axiom of independence itself feels unassailable to me and I am not quite sure this is a strong enough argument against it. Maybe having a more direct argument from axiom of independence to unintuitive result would be more convincing.
Maybe the answer is to read Dawes book, thanks for the reference.
Well, the axiom of independence is just that: an axiom. It doesn’t need to be assailed; we can take it as axiomatic, or not. If we do take it as axiomatic, certain interesting analyses become possible (depending on what other axioms we adopt). If we refuse to do so, then bad things happen—or so it’s claimed.
In any case, Dawes’ argument (and related ones) about the independence axiom fundamentally concerns the question of what properties of an outcome distribution we should concern ourselves with. (Here “outcome distribution” can refer to a probability distribution, or to some set of outcomes, distributed across time, space, individuals, etc., that is generated by some policy, which we may perhaps view as the output of a generator with some probability distribution.)
A VNM-compliant agent behaves as if it is maximizing the expectation of the utility of its outcome distribution. It is not concerned at all with other properties of that distribution, such as dispersion (i.e., standard deviation or some related measure) or skewness. (Or, to put it another way, a VNM-compliant agent is unconcerned with the form of the outcome distribution.)
What Dawes is saying is simply that, contra the assumptions of VNM-rationality, there seems to be ample reason to concern ourselves with, for instance, the skewness of the outcome distribution, and not just its expectation. But if we do prefer one outcome distribution to another, where the dis-preferred distribution has a higher expectation (but a “better” skewness), then we violate the independence axiom.
I get what you are saying. You have convinced me that the following two statements are contradictory:
Axiom of Independence: preferring A to B implies preferring ApC to BpC for any p and C.
The variance and higher moments of utility matter, not just the expected value.
My confusion is that it intuitively it seems both must be true for a rational agent but I guess my intuition is just wrong.
Thanks for your comments, they were very illuminating.
I think you are not allowed to refer explicitly to utility in the options. That is an option of “I do not choose this option” is selfdefeating and illformed. In another post I posited a risk-averse utility function that references amount of paperclips. Maximising the utility function doesn’t maximise expected amount of paperclips. Even if the physical objects of interest are paperclips and we value them linearly a paperclip is not synonymous with utilon. It’s not a thing you can give out in an option.
I was going to answer that I can easily reword my example to not explicitly mention any utility values, but when I tried to that it very quickly led to something where it is obvious that u(A) = u(C). I guess my rewording was basically going through the steps of the proof of VNM theorem.
I am still not sure I am convinced by your objection, as I don’t think there’s anything self-referential in my example, but that did give me some pause.
In a case where you are going to pick less variance less expected value over more variance more expected value it will mean that option needs to have a bigger “utility number”. In order to get that you need to mess with how utility is calculated. Then it becomes ambigious whether the “utility-fruits” are redefined in the same go as we redefine how we compare options. If we name them “paperclips” it’s clear that they are not touched by such redefining.
It triggerred a “type-unsafety” trigger but the operation overall might be safe as it doesn’t actualise the danger. For example having an option of “plum + 2 utility” could give one agent “plum + apple” if it valued apples and “plum + pear” if it valued pears. I guess if you consistenly replace all physical items for their utility values it doesn’t happen.
In the case of “gain 1 utility with probability 1” if your agent is risk-seeking it might give this option “actual” utility less than 1. In general if we lose the distribution independence we might need to retain the information of our suboutcomes rather than collapsing it to he a single number. For if an agent is risk-seeking it’s clear that it would prefer A=( 5% 0,90% 1, 5% 2) to B=(100%, 1). But same risk-seeking in combined lotteries would make it prefer C=(5% , 90% A, 5% A+A) over A. When comparing C and A it’s not sufficent to know that their expected utilities are 1.