MSRayne comments on Bayesian Utility: Representing Preference by Probability Measures

MSRayne 24 Jul 2022 21:41 UTC
6 points
5
I finally deciphered this post just now so I’ll explain how I’m interpreting it for the convenience of future readers. Basically, we start in a world state with various timelines branching off it—points of the initial probability distribution. Each timeline has a particular utility (how much we like it), and a particular probability (how much we expect it). So you can sum utility times probability for all timelines to get the total expected value of this state of the world we’re at right now.
However, we have the option of taking some action, the “event” referenced in the post, which rules out some set of timelines. The remaining set of timelines, the ones we can restrict our future to by performing the action, accounts for some proportion of the total expected value of our current state. That proportion is Q(A), derived from summing the expected value of each timeline in the set and dividing by the expected value of this present state—which is the same as normalizing the present state’s expected value to 1.
If we perform the action, those timelines keep their probability weights, but in the absence of the other timelines now ruled out, we re-scale them to sum to 1, in the sense of Bayesian updating (our action is evidence that we’re in that set of timelines rather than some other set), by dividing by the total proportion of probability mass they had in our initial state (i.e. their total probability), which is P(A).
So, Q(A)/P(A) essentially is like a “score multiplier”. If the action restricts the future to a set of timelines whose proportion of total expected value, from the perspective of the pre-action starting state, is greater than their total probability, this normalized expected value of the action will be greater than 1 - we’ve improved our position, forced the universe into a world state which gives us a better bet than we had before. On the other hand, of course, it could be less than 1 if we restrict to a set of timelines whose density of value proportion per probability is too low—we’ve thrown away some potential value that was originally available to us.
The fun thing is that since Q and P both look like probability distributions—ways of weighting timelines as proportions of the whole—we can modify them with linear transformations in such a way that the preference ordering of Q(A)/P(A) remains unchanged. But that’s where my currently reached understanding stops. I’ll have to analyze the rest of the post to get a better sense of how that transformation works and why it would be useful.