Ok, so this is a lot to take in, but I’ll give you my first takes as a start.
My only disagreement prior to your previous comment seems to be in the legibility of the desirability axiom for U(A∨B) which I think should contain some reference to the actual probabilities of A and B.
Now, I gather that this disagreement probably originates from the fact that I defined U({})=0 while in your framework U(⊤)=0.
Something that appears problematic to me is if we consider the tautology (in Jeffrey notation) U(Doom∨¬Doom)=P(Doom)U(Doom)+P(¬Doom)U(¬Doom)=0. This would mean that reducing the risk of Doom has 0 net utility. In particular, certain Doom and certain ¬Doom are equally preferable (=0). Which I don’t thing either of us agree with. Perhaps I’ve missed something.
Oh, I think I see what confuses me. In the subjective utility framework the expected utilities are shifted to 0 after each Bayesian update?
So then utility of doing action a to prevent a Doom is (P(Doom|a)−P(Doom))U(Doom)+(P(¬Doom|a)−P(¬Doom))U(¬Doom). But when action a has been done then the utility scale is shifted again.
I’m not perfectly sure what the connection with Bayesian updates is here. In general it is provable from the desirability axiom that
U(a)=P(Doom|a)U(Doom∧a)+P(¬Doom|a)U(¬Doom∧a).
This is because any A (e.g. a) is logically equivalent to (A∧B)∨(A∧¬B) for any B (e.g. Doom), which also leads to the “law of total probability”. Then we have a disjunction which we can use with the desirability axiom. The denominator cancels out and gives us P(Doom|a) in the nominator instead of P(Doom∧a), which is very convenient because we presumably don’t know the prior probability of an action P(a). After all, we want to figure out whether we should do a (= make P(a)=1) by calculating U(a) first. It is also interesting to note that a utility maximizer (an instrumentally rational agent) indeed chooses the actions with the highest utility, not the actions with the highest expected utility, as is sometimes claimed.
Yes, after you do an action you become certain you have done it; its probability becomes 1 and its utility 0. But I don’t see that as counterintuitive, since “Doing it again”, or “continuing to do it” would be a different action which has not utility 0. Is that what you meant?
Well, deciding to do action a would also make it utility 0 (edit: or close enough considering remaining uncertainties) even before it is done. At least if you’re committed to the action and then you could just as well consider the decision to be the same as the action.
It would mean that a “perfect” utility maximizer always does the action with utility 0 (edit: but the decision can have positive utility(?)). Which isn’t a problem in any way except that it is alien to how I usually think about utility.
Put in another way. While I’m thinking about which possible action I should take the utilities fluctuate until I’ve decided for an action and then that action has utility 0. I can see the appeal of just considering changes to the status quo, but the part where everything jumps around makes it an extra thing for me to keep track of.
The way I think about it: The utility maximizer looks for the available action with the highest utility and only then decides to do that action. A decision is the event of setting the probability of the action to 1, and, because of that, its utility to 0. It’s not that an agent decides for an action (sets it to probability 1) because it has utility 0. That would be backwards.
There seems to be some temporal dimension involved, some “updating” of utilities. Similar to how assuming the principle of conditionalization
Pt2(H)=Pt1(H|E)
formalizes classical Bayesian updating when something is observed. It sets Pt2(H)
to a new value, and (or because?) it sets Pt2(E) to 1.
A rule for utility updating over time, on the other hand, would need to update both probabilities and utilities, and I’m not sure how it would have to be formalized.
Ah, those timestep subscripts are just what I was missing. I hadn’t realised how much I needed that grounding until I noticed how good it felt when I saw them.
So to summarise (below all sets have mutually exclusive members). In Jeffrey-ish notation we say have the axiom
U(S)=1P(S)∑s∈SP(s)U(s)
and normally you would want to indicate what distribution you have over S in the left-hand side. However, we always renormalize U such that the distribution is our current prior. We can indicate this by labeling the utilities from what timestep (and agent should probably included as well, but lets skip this for now).
Ut(S)=1P(S)∑s∈SP(s)Ut(s)
That way we don’t have to worry about U being shifted during the sum in the right hand side or something. (I mean notationally that would just be absurd, but if I would sit down and estimate the consequences of possible actions I wouldn’t be able to not let this shift my expectation for what action I should take before I was done.).
We can also bring up the utility of an action a to be
Ut(a)=∑ω∈Ω(P(ω|a)−P(ω))Ut(ω∧a)
Furthermore, for most actions it is quite clear that we can drop the subscript t as we know that we are considering the same timestep consistently for the same calculation
U(A∨B)=P(A)A+P(B)U(B)P(A)+P(B),if P(A∧B)=0
Now I’m fine with this because I will have those subscript ts in the back of my mind.
I still haven’t commented on U(A∨B) in general or U(A|B). My intuition is that they should be able to be described from U(A), U(B) and U(A∧B), but it isn’t immediately obvious to me how to do that while keeping U(⊤)=0.
I tried considering a toy case where A=s1∨s2 and B=s2∨s3 (S={s1,s2,s3}) and then
U(A∨B)=U(s1∨s2∨s3)=1P(S)∑s∈SP(s)U(s)
but I couldn’t see how it would be possible without assuming some things about how U(A), U(B) and U(A∧B) relate to each other which I can’t in general.
Regarding the time stamp: Yeah, this is the right way to think about it, at least in the case of subjective utility theory, where utilities represent desires, and probabilities represent beliefs, and it also the right way to think about for Bayesianism (subjective probability theory). U and P only represent the subjective state of an agent at a particular point in time. They don’t say anything how they should be changed over time. They only say that at any point in time, these functions (the agents) should satisfy the axioms.
Rules for change over time would need separate assumptions. In Bayesian probability theory this is usually the rule of classical conditionalization or the more general rule of Jeffrey conditionalization. (Bayes’ theorem alone doesn’t say anything about updating. Bayes’ rule = classical conditionalization + Bayes’ theorem)
Regarding the utility of a, you write the probability part in the sum is P(ω|a)−P(ω). But it is actually just P(ω|a)!
To see this, start with the desirability axiom:
U(A∨B)=P(A)U(A)+P(B)U(B)P(A)+P(B)
This doesn’t tell us how to calculate U(A), only U(A∨B). But we can write A as the logically equivalent (A∧B)∨(A∧¬B)). This is a disjunction, so we can apply the desirability axiom:
U(A)=U((A∧B)∨(A∧¬B))=P(A∧B)U(A∧B)+P(A∧¬B)U(A∧¬B)P(A∧B)+P(A∧¬B)
This is equal to
U(A)=P(A∧B)U(A∧B)+P(A∧¬B)U(A∧¬B)P(A).
Since P(A∧B)P(A)=P(B|A), we have
U(A)=P(B|A)U(A∧B)+P(¬B|A)U(A∧¬B).
Since A was chosen arbitrarily, it can be any proposition whatsoever. And since in Jeffrey’s framework we only consider propositions, all actions are also described by propositions. Presumably of the form “I now do x”. Hence,
U(a)=P(B|a)U(a∧B)+P(¬B|a)U(a∧¬B)
for any B.
This proof could also be extended to longer disjunctions between mutually exclusive propositions apart from B and ¬B. Hence, for a set S of mutually exclusive propositions s,
U(a)=∑s∈SP(s|a)U(a∧s).
The set Ω, the “set of all outcomes”, is a special case of S where the mutually exclusive elements ω of Ω sum to 1. One interpretation is to regard each ω as describing one complete possible world. So,
U(a)=∑ω∈ΩP(ω|a)U(a∧ω).
But of course this holds for any proposition, not just an action a. This is the elegant thing about Jeffrey’s decision theory which makes it so general: He doesn’t need special types of objects (acts, states of the world, outcomes etc) and definitions associated with those.
Regarding the general formula for U(A∨B). Your suggestion makes sense, I also think it should be expressible in terms of U(A), U(B), and U(A∧B). I think I’ve got a proof.
Consider
(A∧B)∨(A∧¬B)∨(¬A∧B)∨(¬A∧¬B)=⊤.
The disjunctions are exclusive. By the expected utility hypothesis (which should be provable from the desirability axiom) and by the U(⊤)=0 assumption, we have
0=E(U(A∧B))+E(U(A∧¬B))+E(U(¬A∧B))+E(U(¬A∧¬B)).
Then subtract the last term:
−E(U(¬A∧¬B))=E(U(A∧B))+E(U(A∧¬B))+E(U(¬A∧B)).
Now since E(U(A))+E(U(¬A))=0 for any A, we have E(U(¬A))=−E(U(A)). Hence, −E(U(¬A∧¬B)=E(U(¬(¬A∧¬B)). By De Morgan, ¬(¬A∧¬B)=A∨B. Therefore
E(U(A∨B))=E(U(A∧B))+E(U(A∧¬B))+E(U(¬A∧B)).
Now add E(U(A∧B)) to both sides:
E(U(A∨B))+E(U(A∧B))=2E(U(A∧B))+E(U(A∧¬B))+E(U(¬A∧B)).
Notice that A=(A∧B)∨(A∧¬B) and B=(A∧B)∨(¬A∧B). Therefore we can write
E(U(A∨B))+E(U(A∧B))=E(U(A))+E(U(B)).
Now subtract E(U(A∧B)) and we have
E(U(A∨B))=E(U(A))+E(U(B))−E(U(A∧B)).
which is equal to
P(A∨B)U(A∨B)=P(A)U(A)+P(B)U(B)−P(A∧B)U(A∧B).
So we have
U(A∨B)=P(A)U(A)+P(B)U(B)−P(A∧B)U(A∧B)P(A∨B).
and hence our theoremU(A∨B)=P(A)U(A)+P(B)U(B)−P(A∧B)U(A∧B)P(A)+P(B)−P(A∧B)
which we can also write as
U(A∨B)=P(A|A∨B)U(A)+P(B|A∨B)U(B)−P(A∧B|A∨B)U(A∧B).
Success!
Okay, now with U(A∨B) solved, what about the definition of U(A|B)? I think I got it:
U(A|B):=U(A∧B)−U(B)
This correctly predicts that U(A|A)=0. And it immediately leads to the plausible consequence U(A∧B)=U(A|B)+U(B). I don’t know how to further check whether this is the right definition, but I’m pretty sure it is.
Some first reflections on the results before I go into examining all the steps.
Hmm, yes my expression seems wrong when I look at it a second time. I think I still confused the timesteps and should have written
U(a)=∑ω∈Ω(P(ω|a)U(ω∧a)−P(ω)U(ω))
The extra negation comes from a reflex from when not using Jeffrey’s decision theory. With Jeffrey’s decision theory it reduces to your expression as the negated terms sum to U(⊤)=0. But, still I probably should learn not to guess at theorems and properly do all steps in the future. I suppose that is a point in favor for Jeffrey’s decision theory that the expressions usually are cleaner.
As for your derivation you used that P(A|B)+P(A|¬B)=P(A) in the derivation but that is not the case for general S. This is a note to self to check whether this still holds for S⊊Ω.
Edit: My writing is confused here disregard it. My conclusion is still
Your expression for U(A∨B) is nice
and what I would have expected. The problem I had was that I didn’t realize that (s1∧s2)∨(s2∧s3)/⟺s1∨s2∨s3 (which should have been obvious). Furthermore your expression checks out with my toy example (if remove the false expectation I had before).
Consider a lottery where you guess the sequence of 3 numbers and s1, s2 and s3 are the corresponding propositions that you guessed correctly and A=s1∧s2 and B=s2∧s2. You only have preferences over whether you win W or not L.
Oh yes, of course! (I probably thought this was supposed to be valid for our S as well, which is assumed to be mutually exclusive, but, unlike Ω, not exhaustive.)
This proof could also be extended to longer disjunctions between mutually exclusive propositions apart from B and ¬B. Hence, for a set S of mutually exclusive propositions s, U(a)=∑s∈SP(s|a)U(a∧s).
which does not rely on the assumption of ∑s∈S(P(s)U(s)) being equal to 0. After all, I only used the desirability axiom for the derivation, not the assumption U(⊤)=0. So we get a “nice” expression anyway as long as our disjunction is mutually exclusive. Right? (Maybe I misunderstood your point.)
Regarding U(A|B), I am now no longer sure that U(A|B):=U(A∧B)−U(B) is the right definition.
Maybe we instead have E[U(A|B)]:=E[U(A∧B)]−E[U(B)]. In which case it would follow that U(A|B):=P(A∧B)U(A∧B)−P(B)U(B)P(A|B).
They are both compatible with U(A|A)=0, and I’m not sure which further plausible conditions would have to be met and which could decide which is the right definition.
Didn’t you use that B∨¬B=⊤. I can see how to extend the derivation for more steps s1∨s2∨…∨sn but only if {si}ni=1=Ω. The sums
∑s∈SP(s|a)U(a∧s)
and
∑ω∈ΩP(ω|a)U(a∧ω)
for arbitrary U are equal if and only if P(Ω∖S|a)=0.
The other alternative I see is if (and I’m unsure about this) we assume that U(z∧a)=U(z) and P(z|a)=P(z) for z∈Ω∖S.
What I would think that U(A|B) would mean is U(A) after we’ve updated probabilities and utilities from the fact that B is certain. I think that would be the first one but I’m not sure. I can’t tell which one that would be.
Yeah, you are right. I used the fact that A↔((A∧B)∨(A∧¬B)). This makes use of the fact that B and ¬B are both mutually exclusive and exhaustive, i.e.(B∧¬B)↔⊥ and (B∨¬B)↔⊤. For S={s1,s2}, where s1 and s2 are mutually exclusive but not exhaustive, A is not equivalent to (A∧s1)∨(A∧s2). Since A can be true without either of s1 or s2 being true.
It should however work if P(A↔(s1∨s2))=1, since then P((A∧s1)∨(A∧s2))=1. So for U(A)=∑s∈S(P(s|A)U(s∧A)) to hold, S would have to be a “partition” of A, exhaustively enumerating all the incompatible ways it can be true.
Regarding conditional utility, I agree. This would mean that U(A∧B)=U(A|B) if P(B)=1. I found an old paper by a someone who analyzes conditional utility in detail, though with zero citations according to Google scholar. Unfortunately the paper is hard to read because of eccentric notation, and since the author, an economist, was apparently only aware of Savage’s more complicated utility theory (which has acts, states of the world, and prospects), so he doesn’t work in Jeffrey’s simpler and more general theory. But his conclusions seem intriguing, since he e.g. also says that U(A|A)=0, despite, as far as I know, Savage not having an axiom which demands utility 0 for certainty. Unfortunately I really don’t understand his notation and I’m not quite an expert on Savage either...
Thanks for the Bradley reference. He does indeed work in Jeffrey’s framework. On conditional utility (“conditional desirability”, in Jeffrey terminology) Bradley references another paper from 1999 where he goes into a bit more detail on the motivation:
To arrive at our candidate expression for conditional desirabilities in terms of unconditional ones, we reason as follows. Getting the news that XY is true is just the same as getting both the news that X is true and the news that Y is true. But DesXY is not necessarily equal to DesX + DesY because of the way in which the desirabilities of X and Y might depend on one another. Unless X and Y are probabilistically independent, for instance, the news that X is true will affect the probability and, hence, the desirability of Y. Or it might affect the desirability of Y directly, because it is the sort of condition that makes Y less or more desirable. It is natural then to think of DesXY as equal, not to the sum of the desirabilities of X and Y, but to the sum of the desirability of X and the desirability of Y given that X is true.
(With DesXY he means U(X∧Y).)
I also found a more recent (2017) book from him, where he defines U(A|B):=U(A∧B)−U(B) and where he uses the probability axioms, Jeffrey’s desirability axiom, and U(⊤)=0 as axioms. So pretty much the same way we did here.
So yeah, I think that settles conditional utility.
In the book Bradley has also some other interesting discussions, such as this one:
[...] Richard Jeffrey is often said to have defended a specific one, namely the ‘news value’ conception of benefit. It is true that news value is a type of value that unambiguously satisfies the desirability axioms. Consider getting the news that a trip to the beach is planned and suppose that one enjoys the beach in sunny weather but hates it in the rain. Then, whether this is good news or not will depend on how likely it is that it is going to be sunny or rainy. If you like, what the news means for you, what its implications are, depends on your beliefs. If it’s going to rain, then the news means a day of being wet and cold; if it’s going to be sunny, then the news means an enjoyable day swimming. In the absence of certainty about the weather, one’s attitude to the prospect will lie somewhere between one’s attitude to these two prospects, but closer to the one that is more probable. This explains why news value should respect the axiom of desirability. It also gives a rationale for the axiom of normality, for news that is certain is no news at all and hence cannot be good or bad.
Nonetheless, considerable caution should be exercised in giving Desirabilism this interpretation. In particular, it should not be inferred that Jeffrey’s claim is that we value something because of its news value. News value tracks desirability but does not constitute it. Moreover, it does not always track it accurately. Sometimes getting the news that X tells us more than just that X is the case because of the conditions under which we get the news. To give an extreme example: if I believe that I am isolated, then I cannot receive any news without learning that this is not the case. This ‘extra’ content is no part of the desirability of X.
Our main interest is in desirability as a certain kind of grounds for acting in conditions of uncertainty. In this respect, it perhaps more helpful to fix one’s intuitions using the concept of willingness to pay than that of news value. For if one imagines that all action is a matter of paying to have prospects made true, then the desirabilities of these prospects will measure (when appropriately scaled) the price that one is willing to pay for them. It is clear that one should not be willing to pay anything to make a tautology true and quite plausible that one should price the prospect of either X or Y by the sum of the probability-discounted prices of the each. So this interpretation is both formally adequate and exhibits the required relationship between desirability and action.
Anyway, someone should do a writeup of our findings, right? :)
Anyway, someone should do a writeup of our findings, right? :)
Sure, I’ve found it to be an interesting framework to think in so I suppose someone else might too. You’re the one who’s done the heavy lifting so far so I’ll let you have an executive role.
If you want me to write up a first draft I can probably do it end of next week. I’m a bit busy for at least the next few days.
I think I will write a somewhat longer post as a full introduction to Jeffrey-style utility theory. But I’m still not quite sure on some things. For example, Bradley suggests that we can also interpret the utility of some proposition as the maximum amount of money we would pay (to God, say) to make it true. But I’m not sure whether that money would rather track expected utility (probability times utility) -- or not. Generally the interpretation of expected utility versus the interpretation of utility is not yet quite clear to me, yet. Have to think a bit more about it...
I’m not sure this is what you mean, but yes, in case of acts, it is indeed so that only the utility of an action matters for our choice, not the expected utility, since we don’t care about probabilities of, or assign probabilities to, possible actions when we choose among them, we just pick the action with the highest utility.
But only some propositions describe acts. I can’t chose (make true/certain) that the sun shines tomorrow, so the probability of the sun shining tomorrow matters, not just its utility. Now if the utility of the sun shining tomorrow is the maximum amount of money I would pay for the sun shining tomorrow, is that plausible? Assuming the utility of sunshine tomorrow is a fixed value x, wouldn’t I pay less money if sunshine is very likely anyway, and more if sunshine is unlikely?
On the other hand, I believe (but am uncertain) the utility of a proposition being true moves towards 0 as its probability rises. (Which would correctly predict that I pay less for sunshine when it is likely anyway.) But I notice I don’t have a real understanding of why or in which sense this happens! Specifically, we know that tautologies have utility 0, but I don’t even see how to prove how it follows that all propositions with probability 1 (even non-tautologies) have utility 0. Jeffrey says it as if it’s obvious, but he doesn’t actually give a proof. And then, more generally, it also isn’t clear to me why the utility of a proposition would move towards 0 as its probability moves towards 1, if that’s the case.
I notice I’m still far from having a good level of understanding of (Jeffrey’s) utility theory...
[...] Richard Jeffrey is often said to have defended a specific one, namely the ‘news value’ conception of benefit. It is true that news value is a type of value that unambiguously satisfies the desirability axioms.
but at the same time
News value tracks desirability but does not constitute it. Moreover, it does not always track it accurately. Sometimes getting the news that X tells us more than just that X is the case because of the conditions under which we get the news.
And I can see how starting from this you would get that U(⊤)=0. However, I think one of the remaining confusions is how you would go in the other direction. How can you go from the premise that we shift utilities to be 0 for tautologies to say that we value something to a large part from how unlikely it is.
And then we also have the desirability axiom
U(A∨B)=P(A)U(A)+P(B)U(B)P(A)+P(B)
for all A and B such that P(A∧B)=0 together with Bayesian probability theory.
What I was talking about in my previous comment goes against the desirability axiom in the sense that I meant that for X="Sun with probability p and rain with probability (1−p)" in the more general case there could be subjects that prefer certain outcomes proportionally more (or less) than usual such that U(X)≠pU(Sun)+(1−p)U(Rain) for some probabilities p. As the equality derives directly from the desirability axiom, it was wrong of me to generalise that far.
But, to get back to the confusion at hand we need to unpack the tautology axiom a bit. If we say that a proposition ⊤ is a tautology if and only if P(⊤)=1[1], then we can see that any proposition that is no news to us has zero utils as well.
And I think it might be well to keep in mind that learning that e.g. sun tomorrow is more probable than we once thought does not necessarily make us prefer sun tomorrow less, but the amount of utils for sun tomorrow has decreased (in an absolute sense). This comes in nicely with the money analogy because you wouldn’t buy something that you expect with certainty anyway[2], but this doesn’t mean that you prefer it any less compared to some other worse outcome that you expected some time earlier. It is just that we’ve updated from our observations such that the utility function now reflects our current beliefs. If you prefer A to B then this is a fact regardless of the probabilities of those outcomes. When the probabilities change, what is changing is the mapping from proposition to real number (the utility function) and it is only changing with an shift (and possibly scaling) by a real number.
At least that is the interpretation that I’ve done.
Ok, so this is a lot to take in, but I’ll give you my first takes as a start.
My only disagreement prior to your previous comment seems to be in the legibility of the desirability axiom for U(A∨B) which I think should contain some reference to the actual probabilities of A and B.
Now, I gather that this disagreement probably originates from the fact that I defined U({})=0 while in your framework U(⊤)=0.
Something that appears problematic to me is if we consider the tautology (in Jeffrey notation) U(Doom∨¬Doom)=P(Doom)U(Doom)+P(¬Doom)U(¬Doom)=0. This would mean that reducing the risk of Doom has 0 net utility. In particular, certain Doom and certain ¬Doom are equally preferable (=0). Which I don’t thing either of us agree with. Perhaps I’ve missed something.
Oh, I think I see what confuses me. In the subjective utility framework the expected utilities are shifted to 0 after each Bayesian update?
So then utility of doing action a to prevent a Doom is (P(Doom|a)−P(Doom))U(Doom)+(P(¬Doom|a)−P(¬Doom))U(¬Doom). But when action a has been done then the utility scale is shifted again.
I’m not perfectly sure what the connection with Bayesian updates is here. In general it is provable from the desirability axiom that U(a)=P(Doom|a)U(Doom∧a)+P(¬Doom|a)U(¬Doom∧a). This is because any A (e.g. a) is logically equivalent to (A∧B)∨(A∧¬B) for any B (e.g. Doom), which also leads to the “law of total probability”. Then we have a disjunction which we can use with the desirability axiom. The denominator cancels out and gives us P(Doom|a) in the nominator instead of P(Doom∧a), which is very convenient because we presumably don’t know the prior probability of an action P(a). After all, we want to figure out whether we should do a (= make P(a)=1) by calculating U(a) first. It is also interesting to note that a utility maximizer (an instrumentally rational agent) indeed chooses the actions with the highest utility, not the actions with the highest expected utility, as is sometimes claimed.
Yes, after you do an action you become certain you have done it; its probability becomes 1 and its utility 0. But I don’t see that as counterintuitive, since “Doing it again”, or “continuing to do it” would be a different action which has not utility 0. Is that what you meant?
Well, deciding to do action a would also make it utility 0 (edit: or close enough considering remaining uncertainties) even before it is done. At least if you’re committed to the action and then you could just as well consider the decision to be the same as the action.
It would mean that a “perfect” utility maximizer always does the action with utility 0 (edit: but the decision can have positive utility(?)). Which isn’t a problem in any way except that it is alien to how I usually think about utility.
Put in another way. While I’m thinking about which possible action I should take the utilities fluctuate until I’ve decided for an action and then that action has utility 0. I can see the appeal of just considering changes to the status quo, but the part where everything jumps around makes it an extra thing for me to keep track of.
The way I think about it: The utility maximizer looks for the available action with the highest utility and only then decides to do that action. A decision is the event of setting the probability of the action to 1, and, because of that, its utility to 0. It’s not that an agent decides for an action (sets it to probability 1) because it has utility 0. That would be backwards.
There seems to be some temporal dimension involved, some “updating” of utilities. Similar to how assuming the principle of conditionalization Pt2(H)=Pt1(H|E) formalizes classical Bayesian updating when something is observed. It sets Pt2(H) to a new value, and (or because?) it sets Pt2(E) to 1.
A rule for utility updating over time, on the other hand, would need to update both probabilities and utilities, and I’m not sure how it would have to be formalized.
Ah, those timestep subscripts are just what I was missing. I hadn’t realised how much I needed that grounding until I noticed how good it felt when I saw them.
So to summarise (below all sets have mutually exclusive members). In Jeffrey-ish notation we say have the axiom
U(S)=1P(S)∑s∈SP(s)U(s)
and normally you would want to indicate what distribution you have over S in the left-hand side. However, we always renormalize U such that the distribution is our current prior. We can indicate this by labeling the utilities from what timestep (and agent should probably included as well, but lets skip this for now).
Ut(S)=1P(S)∑s∈SP(s)Ut(s)
That way we don’t have to worry about U being shifted during the sum in the right hand side or something. (I mean notationally that would just be absurd, but if I would sit down and estimate the consequences of possible actions I wouldn’t be able to not let this shift my expectation for what action I should take before I was done.).
We can also bring up the utility of an action a to be
Ut(a)=∑ω∈Ω(P(ω|a)−P(ω))Ut(ω∧a)
Furthermore, for most actions it is quite clear that we can drop the subscript t as we know that we are considering the same timestep consistently for the same calculation
U(A∨B)=P(A)A+P(B)U(B)P(A)+P(B),if P(A∧B)=0
Now I’m fine with this because I will have those subscript ts in the back of my mind.
I still haven’t commented on U(A∨B) in general or U(A|B). My intuition is that they should be able to be described from U(A), U(B) and U(A∧B), but it isn’t immediately obvious to me how to do that while keeping U(⊤)=0.
I tried considering a toy case where A=s1∨s2 and B=s2∨s3 (S={s1,s2,s3}) and then
U(A∨B)=U(s1∨s2∨s3)=1P(S)∑s∈SP(s)U(s)
but I couldn’t see how it would be possible without assuming some things about how U(A), U(B) and U(A∧B) relate to each other which I can’t in general.
Interesting! I have a few remarks, but my reply will have to wait a few days as I have to finish something.
Regarding the time stamp: Yeah, this is the right way to think about it, at least in the case of subjective utility theory, where utilities represent desires, and probabilities represent beliefs, and it also the right way to think about for Bayesianism (subjective probability theory). U and P only represent the subjective state of an agent at a particular point in time. They don’t say anything how they should be changed over time. They only say that at any point in time, these functions (the agents) should satisfy the axioms.
Rules for change over time would need separate assumptions. In Bayesian probability theory this is usually the rule of classical conditionalization or the more general rule of Jeffrey conditionalization. (Bayes’ theorem alone doesn’t say anything about updating. Bayes’ rule = classical conditionalization + Bayes’ theorem)
Regarding the utility of a, you write the probability part in the sum is P(ω|a)−P(ω). But it is actually just P(ω|a)!
To see this, start with the desirability axiom: U(A∨B)=P(A)U(A)+P(B)U(B)P(A)+P(B) This doesn’t tell us how to calculate U(A), only U(A∨B). But we can write A as the logically equivalent (A∧B)∨(A∧¬B)). This is a disjunction, so we can apply the desirability axiom: U(A)=U((A∧B)∨(A∧¬B))=P(A∧B)U(A∧B)+P(A∧¬B)U(A∧¬B)P(A∧B)+P(A∧¬B) This is equal to U(A)=P(A∧B)U(A∧B)+P(A∧¬B)U(A∧¬B)P(A). Since P(A∧B)P(A)=P(B|A), we have U(A)=P(B|A)U(A∧B)+P(¬B|A)U(A∧¬B). Since A was chosen arbitrarily, it can be any proposition whatsoever. And since in Jeffrey’s framework we only consider propositions, all actions are also described by propositions. Presumably of the form “I now do x”. Hence, U(a)=P(B|a)U(a∧B)+P(¬B|a)U(a∧¬B) for any B.
This proof could also be extended to longer disjunctions between mutually exclusive propositions apart from B and ¬B. Hence, for a set S of mutually exclusive propositions s, U(a)=∑s∈SP(s|a)U(a∧s). The set Ω, the “set of all outcomes”, is a special case of S where the mutually exclusive elements ω of Ω sum to 1. One interpretation is to regard each ω as describing one complete possible world. So, U(a)=∑ω∈ΩP(ω|a)U(a∧ω). But of course this holds for any proposition, not just an action a. This is the elegant thing about Jeffrey’s decision theory which makes it so general: He doesn’t need special types of objects (acts, states of the world, outcomes etc) and definitions associated with those.
Regarding the general formula for U(A∨B). Your suggestion makes sense, I also think it should be expressible in terms of U(A), U(B), and U(A∧B). I think I’ve got a proof.
Consider (A∧B)∨(A∧¬B)∨(¬A∧B)∨(¬A∧¬B)=⊤. The disjunctions are exclusive. By the expected utility hypothesis (which should be provable from the desirability axiom) and by the U(⊤)=0 assumption, we have 0=E(U(A∧B))+E(U(A∧¬B))+E(U(¬A∧B))+E(U(¬A∧¬B)). Then subtract the last term: −E(U(¬A∧¬B))=E(U(A∧B))+E(U(A∧¬B))+E(U(¬A∧B)). Now since E(U(A))+E(U(¬A))=0 for any A, we have E(U(¬A))=−E(U(A)). Hence, −E(U(¬A∧¬B)=E(U(¬(¬A∧¬B)). By De Morgan, ¬(¬A∧¬B)=A∨B. Therefore E(U(A∨B))=E(U(A∧B))+E(U(A∧¬B))+E(U(¬A∧B)). Now add E(U(A∧B)) to both sides: E(U(A∨B))+E(U(A∧B))=2E(U(A∧B))+E(U(A∧¬B))+E(U(¬A∧B)). Notice that A=(A∧B)∨(A∧¬B) and B=(A∧B)∨(¬A∧B). Therefore we can write E(U(A∨B))+E(U(A∧B))=E(U(A))+E(U(B)). Now subtract E(U(A∧B)) and we have E(U(A∨B))=E(U(A))+E(U(B))−E(U(A∧B)). which is equal to P(A∨B)U(A∨B)=P(A)U(A)+P(B)U(B)−P(A∧B)U(A∧B). So we have U(A∨B)=P(A)U(A)+P(B)U(B)−P(A∧B)U(A∧B)P(A∨B). and hence our theorem U(A∨B)=P(A)U(A)+P(B)U(B)−P(A∧B)U(A∧B)P(A)+P(B)−P(A∧B) which we can also write as U(A∨B)=P(A|A∨B)U(A)+P(B|A∨B)U(B)−P(A∧B|A∨B)U(A∧B). Success!
Okay, now with U(A∨B) solved, what about the definition of U(A|B)? I think I got it: U(A|B):=U(A∧B)−U(B) This correctly predicts that U(A|A)=0. And it immediately leads to the plausible consequence U(A∧B)=U(A|B)+U(B). I don’t know how to further check whether this is the right definition, but I’m pretty sure it is.
Some first reflections on the results before I go into examining all the steps.
Hmm, yes my expression seems wrong when I look at it a second time. I think I still confused the timesteps and should have written U(a)=∑ω∈Ω(P(ω|a)U(ω∧a)−P(ω)U(ω))
The extra negation comes from a reflex from when not using Jeffrey’s decision theory. With Jeffrey’s decision theory it reduces to your expression as the negated terms sum to U(⊤)=0. But, still I probably should learn not to guess at theorems and properly do all steps in the future. I suppose that is a point in favor for Jeffrey’s decision theory that the expressions usually are cleaner.
As for your derivation you used that P(A|B)+P(A|¬B)=P(A) in the derivation but that is not the case for general S. This is a note to self to check whether this still holds for S⊊Ω.
Edit: My writing is confused here disregard it. My conclusion is still
and what I would have expected. The problem I had was that I didn’t realize that (s1∧s2)∨(s2∧s3)/⟺s1∨s2∨s3 (which should have been obvious). Furthermore your expression checks out with my toy example (if remove the false expectation I had before).
Consider a lottery where you guess the sequence of 3 numbers and s1, s2 and s3 are the corresponding propositions that you guessed correctly and A=s1∧s2 and B=s2∧s2. You only have preferences over whether you win W or not L.
U(A∨B)=P(s1∧s2)U(s1∧s2)+P(s2∧s3)U(s2∧s3)−P(s1∧s2∧s3)U(s1∧s2∧s3)P(A∨B)=P(s1∧s2)U(s1∧s2)(P(s3)U(W)−P(s3)U(L))+P(s3∧s2)U(s3∧s2)(P(s1)U(W)−P(s1)U(L))−P(s1∧s2∧s3)U(W)P(A∨B)=P(s1∧s2∧s3)U(W)+P(s1∧¬s2∧s3)U(L)+P(s1∧s2∧¬s3)U(L)P(A∨B)=P(W|A∨B)U(W)+P(L|A∨B)U(L)=U(A∨B)
I don’t understand what you mean in the beginning here, how is ∑ω∈Ω(P(ω|a)U(ω∧a)−P(ω)U(ω)) the same as ∑ω∈Ω(P(ω|a)U(ω∧a))?
U(⊤)=∑ω∈ΩP(ω)U(ω)=0 that was one of the premises, no? You expect utility 0 from your prior.
Oh yes, of course! (I probably thought this was supposed to be valid for our S as well, which is assumed to be mutually exclusive, but, unlike Ω, not exhaustive.)
General S (even if mutually exclusive) is tricky I’m not sure the expression is as nice then.
But we have my result above, i.e.
which does not rely on the assumption of ∑s∈S(P(s)U(s)) being equal to 0. After all, I only used the desirability axiom for the derivation, not the assumption U(⊤)=0. So we get a “nice” expression anyway as long as our disjunction is mutually exclusive. Right? (Maybe I misunderstood your point.)
Regarding U(A|B), I am now no longer sure that U(A|B):=U(A∧B)−U(B) is the right definition. Maybe we instead have E[U(A|B)]:=E[U(A∧B)]−E[U(B)]. In which case it would follow that U(A|B):=P(A∧B)U(A∧B)−P(B)U(B)P(A|B). They are both compatible with U(A|A)=0, and I’m not sure which further plausible conditions would have to be met and which could decide which is the right definition.
Didn’t you use that B∨¬B=⊤. I can see how to extend the derivation for more steps s1∨s2∨…∨sn but only if {si}ni=1=Ω. The sums
∑s∈SP(s|a)U(a∧s)
and
∑ω∈ΩP(ω|a)U(a∧ω) for arbitrary U are equal if and only if P(Ω∖S|a)=0.
The other alternative I see is if (and I’m unsure about this) we assume that U(z∧a)=U(z) and P(z|a)=P(z) for z∈Ω∖S.
What I would think that U(A|B) would mean is U(A) after we’ve updated probabilities and utilities from the fact that B is certain.
I think that would be the first one but I’m not sure.I can’t tell which one that would be.Yeah, you are right. I used the fact that A↔((A∧B)∨(A∧¬B)). This makes use of the fact that B and ¬B are both mutually exclusive and exhaustive, i.e.(B∧¬B)↔⊥ and (B∨¬B)↔⊤. For S={s1,s2}, where s1 and s2 are mutually exclusive but not exhaustive, A is not equivalent to (A∧s1)∨(A∧s2). Since A can be true without either of s1 or s2 being true.
It should however work if P(A↔(s1∨s2))=1, since then P((A∧s1)∨(A∧s2))=1. So for U(A)=∑s∈S(P(s|A)U(s∧A)) to hold, S would have to be a “partition” of A, exhaustively enumerating all the incompatible ways it can be true.
Regarding conditional utility, I agree. This would mean that U(A∧B)=U(A|B) if P(B)=1. I found an old paper by a someone who analyzes conditional utility in detail, though with zero citations according to Google scholar. Unfortunately the paper is hard to read because of eccentric notation, and since the author, an economist, was apparently only aware of Savage’s more complicated utility theory (which has acts, states of the world, and prospects), so he doesn’t work in Jeffrey’s simpler and more general theory. But his conclusions seem intriguing, since he e.g. also says that U(A|A)=0, despite, as far as I know, Savage not having an axiom which demands utility 0 for certainty. Unfortunately I really don’t understand his notation and I’m not quite an expert on Savage either...
A⟺((A∧s1)∨(A∧s2)) I agree with as a sufficient criteria to only sum over {s1,s2}, the other steps I’ll have to think about before I get them.
I found this newer paper https://personal.lse.ac.uk/bradleyr/pdf/Unification.pdf and having skimmed it seemed like it had similar premises but they defined U(A|B)=U(A∧B)−U(B)+U(⊤)=U(A∧B)−U(B) (instead of deriving it).
Thanks for the Bradley reference. He does indeed work in Jeffrey’s framework. On conditional utility (“conditional desirability”, in Jeffrey terminology) Bradley references another paper from 1999 where he goes into a bit more detail on the motivation:
(With DesXY he means U(X∧Y).)
I also found a more recent (2017) book from him, where he defines U(A|B):=U(A∧B)−U(B) and where he uses the probability axioms, Jeffrey’s desirability axiom, and U(⊤)=0 as axioms. So pretty much the same way we did here.
So yeah, I think that settles conditional utility.
In the book Bradley has also some other interesting discussions, such as this one:
Anyway, someone should do a writeup of our findings, right? :)
Sure, I’ve found it to be an interesting framework to think in so I suppose someone else might too. You’re the one who’s done the heavy lifting so far so I’ll let you have an executive role.
If you want me to write up a first draft I can probably do it end of next week. I’m a bit busy for at least the next few days.
I think I will write a somewhat longer post as a full introduction to Jeffrey-style utility theory. But I’m still not quite sure on some things. For example, Bradley suggests that we can also interpret the utility of some proposition as the maximum amount of money we would pay (to God, say) to make it true. But I’m not sure whether that money would rather track expected utility (probability times utility) -- or not. Generally the interpretation of expected utility versus the interpretation of utility is not yet quite clear to me, yet. Have to think a bit more about it...
Isn’t that just a question whether you assume expected utility or not. In the general case it is only utility not expected utility that matters.
I’m not sure this is what you mean, but yes, in case of acts, it is indeed so that only the utility of an action matters for our choice, not the expected utility, since we don’t care about probabilities of, or assign probabilities to, possible actions when we choose among them, we just pick the action with the highest utility.
But only some propositions describe acts. I can’t chose (make true/certain) that the sun shines tomorrow, so the probability of the sun shining tomorrow matters, not just its utility. Now if the utility of the sun shining tomorrow is the maximum amount of money I would pay for the sun shining tomorrow, is that plausible? Assuming the utility of sunshine tomorrow is a fixed value x, wouldn’t I pay less money if sunshine is very likely anyway, and more if sunshine is unlikely?
On the other hand, I believe (but am uncertain) the utility of a proposition being true moves towards 0 as its probability rises. (Which would correctly predict that I pay less for sunshine when it is likely anyway.) But I notice I don’t have a real understanding of why or in which sense this happens! Specifically, we know that tautologies have utility 0, but I don’t even see how to prove how it follows that all propositions with probability 1 (even non-tautologies) have utility 0. Jeffrey says it as if it’s obvious, but he doesn’t actually give a proof. And then, more generally, it also isn’t clear to me why the utility of a proposition would move towards 0 as its probability moves towards 1, if that’s the case.
I notice I’m still far from having a good level of understanding of (Jeffrey’s) utility theory...
So we have that
but at the same time
And I can see how starting from this you would get that U(⊤)=0. However, I think one of the remaining confusions is how you would go in the other direction. How can you go from the premise that we shift utilities to be 0 for tautologies to say that we value something to a large part from how unlikely it is.
And then we also have the desirability axiom
U(A∨B)=P(A)U(A)+P(B)U(B)P(A)+P(B)
for all A and B such that P(A∧B)=0 together with Bayesian probability theory.
What I was talking about in my previous comment goes against the desirability axiom in the sense that I meant that for X="Sun with probability p and rain with probability (1−p)" in the more general case there could be subjects that prefer certain outcomes proportionally more (or less) than usual such that U(X)≠pU(Sun)+(1−p)U(Rain) for some probabilities p. As the equality derives directly from the desirability axiom, it was wrong of me to generalise that far.
But, to get back to the confusion at hand we need to unpack the tautology axiom a bit. If we say that a proposition ⊤ is a tautology if and only if P(⊤)=1 [1], then we can see that any proposition that is no news to us has zero utils as well.
And I think it might be well to keep in mind that learning that e.g. sun tomorrow is more probable than we once thought does not necessarily make us prefer sun tomorrow less, but the amount of utils for sun tomorrow has decreased (in an absolute sense). This comes in nicely with the money analogy because you wouldn’t buy something that you expect with certainty anyway[2], but this doesn’t mean that you prefer it any less compared to some other worse outcome that you expected some time earlier. It is just that we’ve updated from our observations such that the utility function now reflects our current beliefs. If you prefer A to B then this is a fact regardless of the probabilities of those outcomes. When the probabilities change, what is changing is the mapping from proposition to real number (the utility function) and it is only changing with an shift (and possibly scaling) by a real number.
At least that is the interpretation that I’ve done.
This seems reasonable but non-trivial to prove depending on how we translate between logic and probability.
If you do, you either don’t actually expect it or has a bad sense of business.