Expected Utility, Geometric Utility, and Other Equivalent Representations

In Scott Garrabrant’s excellent Geometric Rationality sequence, he points out an equivalence between modelling an agent as

Maximizing the expected logarithm of some quantity $V$ , $E [l n (V)]$
Maximizing the geometric expectation of $V$ , $G [V]$

And as we’ll show in this post, not only can we prove a geometric version of the VNM utility theorem:

An agent is VNM-rational if and only if there exists a function $V$ that:
- Represents the agent’s preferences over lotteries
  - $L ≺ M$ if and only if $V (L) < V (M)$
- Agrees with the geometric expectation of $V$
  - $V = G [V]$

Which in and of itself is a cool equivalence result, that $E [U]$ maximization $⟺$ VNM rationality $⟺$ $G [V]$ maximization. But it turns out these are just two out of a huge family of expectations we can use, like the harmonic expectation $H$ , which each have their own version of the VNM utility theorem. We can model agents as maximizing expected utility, geometric utility, harmonic utility, or whatever representation is most natural for the problem at hand.

Expected Utility Functions

The VNM utility theorem is that an agent satisfies the VNM axioms if and only if there exists a utility function $U : Δ Ω \to R$ which:

Represents that agent’s preferences over all lotteries
- $L ≺ M$ if and only if $U (L) < U (M)$
Agrees with its expected value
- $U = E [U]$

Where $Δ Ω$ is a probability distribution over outcomes $ω \in Ω$ .

The first property is easy to preserve. Given any strictly increasing function $f$ ,

$f (U (L)) < f (U (M))$ if and only if $U (L) < U (M)$

So $f \circ U$ also represents our agent’s preferences. But it’s only affine transformations $f (U) = a U + b$ that preserve the second property that $f (U) = E [f (U)]$ . And it’s only increasing affine transformations that preserve both properties at once.

f-Utility Functions

But what if we were interested in other ways of aggregating utilities? One of the central points of Scott’s Geometric Rationality sequence is that in many cases, the geometric expectation $G$ is a more natural way to aggregate utility values $V$ into a single representative number $G [V]$ .

We can represent the same preferences using the expected logarithm $E [l n (V)]$ , but this can feel arbitrary, and having to take a logarithm is a hint that these quantities $V$ are most naturally combined by multiplying them together. The expectation operator $E$ can emulate a weighted product, but the geometric expectation operator $G$ is a weighted product, and we can model an agent as maximizing $G [V]$ without ever bringing $E$ into the picture.

And as scottviteri asks:

If arithmetic and geometric means are so good, why not the harmonic mean? https://en.wikipedia.org/wiki/Pythagorean_means. What would a “harmonic rationality” look like?

They also link to a very useful concept I’d never seen before: the power mean. Which generalizes many different types of average into one family, parameterized by a power $p$ . Set $p = 1$ and you’ve got the arithmetic mean $E$ . Set $p = 0$ and you’ve got the geometric mean $G$ . And if you set $p = - 1$ you’ve got the Harmonic mean $H$ .

It’s great! And I started to see if I could generalize my result to other values of $p$ . What is the equivalent of $G [V] = e^{E [l n (V)]}$ for $H$ ? Well, scottviteri set me down the path towards learning about an even broader generalization of the idea of a mean, which captures the power mean as a special case: the quasi-arithmetic mean or $f$ -mean, since it’s parameterized by a function $f$ .

For our baseline definition, $f : I \to R$ will be a continuous, strictly increasing function that maps an interval $I$ of the real numbers to the real numbers $R$ . We’re also going to be interested in a weighted average, and in particular a probability weighted average over outcomes $ω \in Ω$ . We’ll use the notation $ω \sim P$ to denote sampling $ω$ from the probability distribution $P$ .

Here’s the definition of the $f$ -expectation $M_{f}$ :

$M_{f} [V] = f^{- 1} (E [f \circ V])$

$M_{f, ω \sim P} [V (ω)] = f^{- 1} (E_{ω \sim P} [f (V (ω))])$

Which for finite sets of outcomes looks like:

$M_{f, ω \sim P} [V (ω)] = f^{- 1} (\sum ω \in Ω P (ω) f (V (ω)))$

So for example:

If $I = R$ and $f (V) = V$ (or any increasing affine transformation $f (V) = a V + b$ , where $a > 0$ ), then $M_{f}$ is the arithmetic expectation $M_{f} = E$ .
If $I = R^{> 0}$ , the positive real numbers, and $f (V) = l n (V)$ (or any logarithm $f (V) = l o g_{b} (V)$ where $b > 0$ and $b \neq$ 1), then $M_{f}$ is the geometric expectation $M_{f} = G$ .
- We can also extend $f$ to include $f (0) = lim V \to 0 f (V) = - \infty$ , to cover applications like Kelly betting. This will still be a strictly increasing bijection, and I expect it will work with any result that relies on $f$ being continuous.
If $I = R^{> 0}$ , and $f (V) = \frac{- 1}{V}$ , then $M_{f}$ is the harmonic expectation $M_{f} = H$ .
- Using $\frac{1}{V}$ would also compute the harmonic expectation, but $\frac{- 1}{V}$ is strictly increasing. And that lets us frame our agent as always maximizing a utility function.
If $I = R^{> 0}$ , and $f (V) = V^{p}$ , then $M_{f}$ is the power expectation $M_{f} = M_{p}$ using the power $p$ .

An $f$ -utility function $V$ represents an agent’s preferences:

$L ≺ M$ if and only if $V (L) < V (M)$

And agrees with the $f$ -expectation of $V$

$V = M_{f} [V]$

It turns out that for every $f$ -utility function $V$ , there is a corresponding expected utility function $U$ , and vice versa. We’ll prove this more rigorously in the next sections, but it turns out that these are equivalent ways of representing the same preferences.

Equivalence of Maximization

Here’s the core insight that powers the rest of this equivalence result, which Scott articulates here:

Maximization is invariant under applying a monotonic function… So every time we maximize an expectation of a logarithm, this was equivalent to just maximizing the geometric expectation.

If we think of an agent as maximizing over $π \in Π$ , something under their control like their action or policy, we can write this as:

$arg max π \in Π G [V] = arg max π \in Π e^{E [l n (V)]}$

$arg max π \in Π G [V] = arg max π \in Π E [l n (V)]$

So for every geometric utility function $V$ , there is a corresponding expected utility function $U = l n (V)$ which gives the same result if maximized.

This equivalence can be generalized to all $f$ -utility functions, and it follows from exactly the same reasoning. We have a strictly increasing function $f$ , and so $f^{- 1}$ must be strictly increasing as well.

If $V_{1} < V_{2} ⟹ f (V_{1}) < f (V_{2})$
Then $f^{- 1} (U_{1}) < f^{- 1} (U_{2}) ⟹ U_{1} < U_{2}$

And so $arg max$ will ignore either one. Let’s use that to simplify the expression for maximizing the $f$ -expectation of $V$ :

$arg max π \in Π M_{f} [V] = arg max π \in Π f^{- 1} (E [f \circ V])$

$arg max π \in Π M_{f} [V] = arg max π \in Π E [f \circ V]$

And this suggests a substitution that will turn out to be extremely useful:

$U = f \circ V$

$arg max π \in Π M_{f} [V] = arg max π \in Π E [U]$

There is a function $U$ whose expectation we can maximize, and this is equivalent to maximizing the $f$ -expectation of our $f$ -utility function $M_{f} [V]$ . And we’ll show that $U$ is indeed an expected utility function! Similarly, we can apply $f^{- 1}$ to both sides to get a suggestion (which turns out to work) for how to turn an expected utility function into an equivalent $f$ -utility function.

$f \circ V = U$

$f^{- 1} \circ f \circ V = f^{- 1} \circ U$

$V = f^{- 1} \circ U$

Duality

f-Utility Functions Correspond to Expected Utility Functions

It turns out that for every expected utility function $U$ , there is a corresponding $f$ -utility function $V$ , and vice versa. And this duality is given by $f$ and $f^{- 1}$ .

We’ll start by showing how to go from $U$ to $V$ . Given an expected utility function $U$ , we’ll define $V$ to be

$V = f^{- 1} \circ U$

We know from the VNM expected utility theorem that

$U = E [U]$

Let’s plug both of those into the definition of $M_{f}$

$M_{f} [V] = f^{- 1} (E [f \circ V])$

$M_{f} [V] = f^{- 1} (E [f \circ f^{- 1} \circ U])$

$M_{f} [V] = f^{- 1} (E [U])$

$M_{f} [V] = f^{- 1} (U)$

$M_{f} [V] = V$

So $V$ agrees with $M_{f} [V]$ . And since $f$ is strictly increasing, and $U$ represents an agent’s preferences, so does $V$ .

$L ≺ M$ if and only if $V (L) < V (M)$

Which means $V$ is an $f$ -utility function!

This gives us one half of the VNM theorem for $f$ -utility functions. If an agent is VNM rational, their preferences can be represented using an $f$ -utility function $V$ .

Expected Utility Functions Correspond to f-Utility Functions

We can complete the duality by going the other way, starting from an $f$ -utility function $V$ and showing there is a unique corresponding expected utility function $U$ . We’ll define $U$ to be:

$U = f \circ V$

And we’ll plug that and the fact that $V = M_{f} [V]$ into the definition of $M_{f} [V]$ .

$M_{f} [V] = f^{- 1} (E [f \circ V])$

$M_{f} [V] = f^{- 1} (E [U])$

$V = f^{- 1} (E [U])$

$f \circ V = f (f^{- 1} (E [U]))$

$f \circ V = E [U]$

$U = E [U]$

And that’s it! Starting from an $f$ -utility function $V$ , we can apply a strictly increasing function $f$ to get an expected utility function $U$ which represents the same preferences and agrees with $E [U]$ .

This also gives us the other half of the VNM theorem for $f$ -utility functions. If an agent’s preferences can be represented using an $f$ -utility function $V$ , they can be represented using an expected utility function $U$ , and that agent must therefore be VNM-rational.

f as a Bijection of Utility Functions

So for every $f$ -utility function $V$ , we can apply $f$ and get an expected utility function $U$ . And the same is true in reverse when applying $f^{- 1}$ . Does this translation process have any collisions in either direction? Are there multiple $f$ -utility functions $V$ and $W$ that correspond to the same expected utility function $U$ , or vice versa?

It turns out that $f$ creates a one-to-one correspondence between $f$ -utility functions and expected utility functions. And a consequence of that is that all of these languages are equally expressive: there are no preferences we can model using an $f$ -utility function that we can’t model using an expected utility function, and vice versa.

Another way to frame this duality is to say that our translation function $f : I \to R$ is a bijection between its domain $I$ and its image $f (I)$ . And this induces a structure-preserving bijection between utility functions $U : Δ Ω \to R$ and $f$ -utility functions $V : Δ Ω \to I$ .

$V = f^{- 1} \circ U$

$U = f \circ V$

To show this, we can show that $f$ is injective and surjective between these sets of utility functions.

f is Injective

An injective function, also known as a one-to-one function, maps distinct elements in its domain to distinct elements in its codomain. In other words, injective functions don’t have any collisions. So in this case, we want to show that given two distinct $f$ -utility functions $V$ and $W$ , $f \circ V$ and $f \circ W$ must also be distinct.

$V \neq W ⟹ f \circ V \neq f \circ W$

Since $V$ and $W$ are distinct $f$ -utility functions, they must disagree about some input $ω$ .

$V (ω) \neq W (ω)$

And since $f$ is strictly increasing, it can’t map these different values in $I$ to the same value in $R$ .

$f (V (ω)) \neq (W (ω))$

And thus $f \circ V$ must be a distinct function from $f \circ W$ .

f is Surjective

A surjective function maps every element in its domain onto an element of its codomain, and these functions are also called “onto” functions. So in this case, we want to show that given an expected utility function $U$ , there is an $f$ -utility function $V$ such that $f \circ V = U$ . And this is exactly the $f$ -utility function that $f^{- 1}$ picks out.

$V = f^{- 1} \circ U$

$f \circ V = f \circ f^{- 1} \circ U$

$f \circ V = U$

And that’s it! $f$ induces a one-to-one correspondence between expected utility functions $U$ and $f$ -utility functions $V$ . We can freely translate between these languages and maximization will treat them all equivalently.

Composition

I also want to quickly show two facts about how $f$ -expectations combine together:

The $f$ -expectation of $f$ -expectations is another $f$ -expectation
- $M_{f} [M_{f} [V]] = M_{f} [V]$
The weights combine multiplicatively, as we’d expect from conditional probabilities
- Analogous to $P (A \land B) = P (A) P (B | A)$

All of this is going to reduce to an expectation of expectations, so let’s handle that first. Let’s say we have a family of $n$ probability distributions $P_{i} (ω)$ and expected utility functions $U_{i}$ . And then we sample $i \in [1 . . n]$ from a probability distribution I’ll suspiciously call $ψ$ .

$E_{i \sim ψ} [E_{ω \sim P_{i} (ω)} [U_{i} (ω)]] = E_{i \sim ψ} [\sum ω \in Ω P_{i} (ω) U_{i} (ω)]$

$E_{i \sim ψ} [E_{ω \sim P_{i} (ω)} [U_{i} (ω)]] = \sum i \in [1 . . n] ψ_{i} (\sum ω \in Ω P_{i} (ω) U_{i} (ω))$

$E_{i \sim ψ} [E_{ω \sim P_{i} (ω)} [U_{i} (ω)]] = \sum i \in [1 . . n] \sum ω \in Ω ψ_{i} P_{i} (ω) U_{i} (ω)$

$E_{i \sim ψ} [E_{ω \sim P_{i} (ω)} [U_{i} (ω)]] = \sum (i, ω) \in [1 . . n] \times Ω ψ_{i} P_{i} (ω) U_{i} (ω)$

Taking the expectation over $i$ of the expectation over $P_{i} (ω)$ is equivalent to taking the expectation over pairs $(i, ω)$ .^[1]

$P (i, ω) = ψ_{i} P_{i} (ω)$

$E_{i \sim ψ} [E_{ω \sim P_{i} (ω)} [U_{i} (ω)]] = E_{(i, ω) \sim P (i, ω)} [U_{i} (ω)]$

$E [E [U]] = E [U]$

This is one way to frame Harsanyi aggregation. Sample an agent according to a probability distribution $ψ$ , then evaluate their expected utility using that agent’s beliefs. The Harsanyi score is the expectation of expected utility, and the fact that nested expectations can be collapsed is exactly why aggregating this way satisfies the VNM axioms. The Harsanyi aggregate is VNM rational with respect to the conditional probability distribution $P (i, ω)$ .

Knowing that, the general result for $f$ -expectations is even easier:

$M_{f, i \sim ψ} [M_{f, ω \sim P_{i} (ω)} [V_{i}]] = f^{- 1} (E_{i \sim ψ} [f \circ M_{f, ω \sim P_{i} (ω)} [V_{i}]])$

$M_{f, i \sim ψ} [M_{f, ω \sim P_{i} (ω)} [V_{i}]] = f^{- 1} (E_{i \sim ψ} [f \circ f^{- 1} (E_{ω \sim P_{i}} [f \circ V_{i}])])$

$M_{f, i \sim ψ} [M_{f, ω \sim P_{i} (ω)} [V_{i}]] = f^{- 1} (E_{i \sim ψ} [E_{ω \sim P_{i}} [f \circ V_{i}]])$

$M_{f, i \sim ψ} [M_{f, ω \sim P_{i} (ω)} [V_{i}]] = f^{- 1} (E_{(i, ω) \sim P (i, ω)} [f \circ V_{i}]])$

$M_{f, i \sim ψ} [M_{f, ω \sim P_{i} (ω)} [V_{i}]] = M_{f, (i, ω) \sim P (i, ω)} [V_{i}]]$

$M_{f} [M_{f} [V]] = M_{f} [V]]$

So for example, Scott motivates the idea of Kelly betting as the result of negotiating between different counterfactual versions of the same agent. In that framing, $G [V]$ naturally captures “Nash bargaining but weighted by probability.” If we geometrically aggregate these geometric expected utilities $G [G [V]]$ , the result is the same as one big geometrically aggregate over all counterfactual versions of all agents, weighted by $P (i, ω) = ψ_{i} P_{i} (ω)$ . And the fact that we can model this aggregate as maximizing $G [V]$ means that it’s VNM-rational as well!

This is a very cool framing, and there are some phenomena that I think are easier to understand this way than using the expected utility lens. But since Geometric Rationality is Not VNM Rational, we know that the $G [G [V]]$ model won’t have all the cool features we want from a theory of geometric rationality, like actively preferring to randomize our actions in some situations.

Conclusions

With all of that under our belt, we can now reiterate the $f$ -Utility Theorem for VNM Agents. Which is that an agent is VNM-rational if and only if there exists a function $V : Δ Ω \to I$ that:

Represents the agent’s preferences over lotteries
- $L ≺ M$ if and only if $V (L) < V (M)$
Agrees with the $f$ -expectation of $V$
- $V = M_{f} [V]$

We can model a VNM-rational agent as maximizing expected utility $E [U]$ , geometric expected utility $G [V]$ , harmonic expected utility $H [V]$ , or any other $f$ -expectation $M_{f} [V]$ that’s convenient for analysis. We can translate these into expected utility functions, but we can also work with them in their native language.

We can also think of this equivalence as an impossibility result. We may want to model preferences that violate the VNM axioms, like a group preference for coin flips in cases where that’s more fair than guaranteeing any agent their most preferred outcome. And the equivalence of all these representations means that none of them can model such preferences.

One approach that does work is mixing these expectations together. Our best model of geometric rationality to my knowledge is $G [E [U]]$ ; the geometric expectation of expected utility. See Geometric Rationality is Not VNM Rational for more details, but the way I’d frame it in this sequence is that the inner expectation means that the set of feasible joint utilities $F$ is always convex.

And maximizing the geometric expectation, aka the geometric aggregate, always picks a Pareto optimum, which is unique as long as all agents have positive weight.

A curve intersecting with a Pareto frontier at a single point p — Interactive version here

Check out the main Geometric Utilitarianism post for more details, but I think of these equivalence results as telling us what we can do while staying within the VNM paradigm, and what it would take to go beyond it.

^
We went through the proof for discrete probability distributions here, but $E [E [U]] = E [U]$ holds for all probability distributions.