The Geometric Expectation
A Suspicious Pattern
There is a pattern that shows up in many of the toys we like to play with around here: the pattern of maximizing the expected logarithm.
Nash bargaining is a method for aggregating preferences without a means to directly compare them. When Nash bargaining, you are maximizing the expected logarithm of utility, where the expectation is over uncertainty about which person you are.
Kelly betting is an extremely useful tool for not putting all your future wealth in one basket. When Kelly betting, you are maximizing the expected logarithm of your wealth.
The log scoring rule is a very natural way to extract beliefs. When maximizing your log score, you are maximizing the expectation of the logarithm of the probability you assign to the right answer. This is one example of a general pattern. Maximizations of expected logarithms show up all over information theory, often phrased as minimizing the negative of the expected logarithm.
Why does maximization of the expected logarithm keep showing up?
One answer is that all of the instances of it showing up are actually related. In my previous two posts, I made some connections between Nash bargaining and Kelly betting. The fact that Kelly betting can be used to model Bayesian updating illustrates its relationship with the information theory applications. To a certain extent, there is really only one instance of this pattern.
However, I think that there is another argument for why you should expect this pattern to show up a lot, which is that the pattern is very simple. More simple than it looks on the surface. It only looks complicated because mathematicians have failed us.
The Geometric Integral
One of the most underrated concepts in mathematics is the geometric integral, given by . (The fact that I couldn’t easily get a latex symbol that looks like an elongated P is a testament to its underratedness.) The geometric integral is just like the standard integral, but everywhere you would add, you multiply instead. Defining it in terms of the standard (arithmetic) integral with logs and exponents is insulting to its nature, and I don’t recommend thinking of it that way. (You wouldn’t define as .) Instead, you should just think of it as the multiplicative version of the integral. However, using logs and exponentiation, it is the fastest way to get the definition across.
I think people don’t practice thinking multiplicatively enough, which causes them to throw inherently multiplicative things into logarithms, so they can think about them additively.
I will use the phrase geometric expectation when I take a geometric integral over a probability distribution, and I will use the symbol . Thus, we will write .
Discrete Geometric Expectations
Luckily, most of the time, we will want to talk about discrete geometric expectations, where we can use (possibly infinite) sums rather than integrals and (possibly infinite) products rather than geometric integrals.
Let us gain some intuition for discrete geometric expectations by going though some simple cases. We will start with a uniform distribution on a finite set.
Let be a finite set with elements. Let be a function that assigns a nonnegative value to each . Let be the uniform probability distribution on that assigns probability to each element of .
We have that . This is just the average, or arithmetic mean of the values.
We can compute using the above formula . Here, we get
.
Thus, the geometric expectation of the uniform distribution is just the geometric mean of the values. Hence the name.
The infinite non-uniform discrete case is not much more difficult. If is a finite or countably infinite set, assigns a nonnegative value to each , and is a probability distribution on , then , and
.
These two values can be thought of as a weighted arithmetic mean and weighted geometric mean respectively.
When taking the geometric expectation of with respect to , you just take the product over all of . You are multiplying together all the values, but the exponent is saying that values with less probability get less weight (or less “power”).
Maximizing the Geometric Expectation
Maximization is invariant under applying a monotonic function. Thus .
So every time we maximize an expectation of a logarithm, this was equivalent to just maximizing the geometric expectation.
Rather than saying “maximize the geometric expectation”, I will just say “geometrically maximize”. For example, when Kelly betting, we are just geometrically maximizing wealth. Note that the unit on the geometric expectation of wealth is dollars. The unit on the expected logarithm of dollars is… confusing? It is log dollars, but like, you add it instead of multiplying? I don’t know how it works. What even is a log dollar?
The geometric expectation just makes more sense than the expected logarithm. It is a real thing with a real meaning. However, when we put the geometric expectation inside of a maximization, and we don’t naturally think in terms of geometric expectations, we are tempted to take a logarithm of the whole thing, (which we can do because the maximization eats the monotonic function), and end up with maximizing the expected logarithm.
Geometric Rationality
When Kelly betting, you are really just geometrically maximizing wealth.
When Nash Bargaining, you are really just geometrically maximizing expected utility with respect to your uncertainty about your identity. In defense of Nash bargaining, It is normally presented as maximizing the product of the utilities. However, if you don’t already have the concept of geometric expectation, it is tempting to convert it to an expected logarithm so you can handle the weighted case and think of it as being about uncertainty behind the veil of ignorance. (Also, it is more like the square root of the product of the utilities rather than the product of the utilities.)
When maximizing log score, you are really just geometrically maximizing the probability you assign your observation.
I will informally use the phrase “geometric rationality” to refer to techniques that tend to geometrically maximize natural features (of the world or the self). I want to raise to attention the hypothesis that humans are evolved to be naturally inclined towards geometric rationality over arithmetic rationality, and that around here, the local memes have moved us too far off this path.
- Geometric Exploration, Arithmetic Exploitation by 24 Nov 2022 15:36 UTC; 120 points) (
- Local Memes Against Geometric Rationality by 21 Dec 2022 3:53 UTC; 90 points) (
- Geometric Utilitarianism (And Why It Matters) by 12 May 2024 3:41 UTC; 26 points) (
- EA & LW Forums Weekly Summary (14th Nov − 27th Nov 22′) by 29 Nov 2022 22:59 UTC; 22 points) (EA Forum;
- Geometric Exploration, Arithmetic Exploitation by 24 Nov 2022 15:36 UTC; 21 points) (EA Forum;
- EA & LW Forums Weekly Summary (14th Nov − 27th Nov 22′) by 29 Nov 2022 23:00 UTC; 21 points) (
- Local Memes Against Geometric Rationality by 21 Dec 2022 3:53 UTC; 18 points) (EA Forum;
- Should we Maximize the Geometric Expectation? by 19 Apr 2024 20:06 UTC; 12 points) (EA Forum;
- Expected Utility, Geometric Utility, and Other Equivalent Representations by 20 Nov 2024 23:28 UTC; 10 points) (
- Should we maximize the Geometric Expectation of Utility? by 17 Apr 2024 10:37 UTC; 5 points) (
- 19 Jan 2024 0:59 UTC; 4 points) 's comment on Tyranny of the Epistemic Majority by (
- 25 Nov 2022 10:27 UTC; 1 point) 's comment on Fair Collective Efficient Altruism by (
A video on the geometric derivative by the ever excellent Michael Penn:
Edit:
The geometric derivative is the instantaneous exponential growth rate i.e.f∗(x)=exp[f′(x)/f(x)] where f∗(x) is the geometric derivative.
Which is equivalent to f∗(x)=exp[ddxln(f(x))]
And if I pushed around symbols correctly, the geometric derivative can be pulled inside of a geometric expectation (∇∗θGx∼P(x)[f(x)]=Gx∼P(x)[∇∗θf(x)]) similarly to how an additive derivative can be pulled inside an additive expectation (∇θEx∼P(x)[fθ(x)]=Ex∼P(x)[∇θfθ(x)]). Also, just as additive expectation distributes over addition (E[f(x)+g(x)]=E[f(x)]+E[g(x)]), geometric expectation distributes over multiplication (G[f(x)g(x)]=G[f(x)]G[g(x)]).
I think what is going on here is that both ∇∗ and G are of the form (e∧)∘g∘ln with g=∇ and g=E, respectively. Let’s define the star operator as g∗=(e∧)∘g∘ln. Then (f∘g)∗=(e∧)∘(f∘g)∘ln=(e∧)∘f∘ln∘(e∧)∘g∘ln=f∗∘g∗, by associativity of function composition. Further, if f and g commute, then so do f∗ and g∗: g∗∘f∗=(g∘f)∗=(f∘g)∗=f∗∘g∗.
So the commutativity of the geometric expectation and derivative fall directly out of their representation as E∗ and ∇∗, respectively, by commutativity of E and ∇, as long as they are over different variables.
We can also derive what happens when the expectation and gradient are over the same variables: (∇θ∘Ex∼Pθ(x))∗. First, notice that (∗k)∗(x)=ek∗lnx=elnx∗k=xk, so (∗k)∗=(∧k).. Also (+k)∗(x)=ek+ln(x)=ekeln(x)=xek⟹(+k)∗=(∗ek).
Now let’s expand the composition of the gradient and expectation. (∇θ∘Ex∼Pθ(x))(f(x))=∇θ∫Pθ(x)f(x)dx=Ex∼Pθ(x)[∇θ(f(x)lnPθ(x))], using the log-derivative trick. So ∇θ∘Ex∼Pθ(x)=Ex∼Pθ(x)∘∇θ∘(∗lnPθ(x)).
Therefore, ∇∗θ∘Gx∼Pθ(x)=(∇θ∘Ex∼Pθ(x))∗ =E∗x∼Pθ(x)∘∇∗θ∘(∗lnPθ(x))∗ =Gx∼Pθ∘∇∗θ∘(∧lnPθ).
Writing it out, we have ∇∗θGx∼Pθ(x)[f(x)]=Gx∼Pθ(x)[∇∗θ(f(x)lnPθ(x)].
This entire series and especially this post are excellent, thanks :)
Thanks for the post—I’ve been having thoughts in this general direction and found this post helpful. I’m somewhat drawn to geometric rationality because it gives more intuitive answers in thoughts experiments involving low probabilities of extreme outcomes, such as Pascal’s mugging. I also agree with your claim that “humans are evolved to be naturally inclined towards geometric rationality over arithmetic rationality.”
On the other hand, it seems like geometric rationality only makes sense in the context of natural features that cannot take on negative values. Most of the things I might want to maximize (e.g. utility) can be negative. Do you have thoughts on the extent to which we can salvage geometric rationality from this problem?
But if your utility function is bounded, as it apparently should be then you’re one affine transform away from being able to use geometric rationality, no?
How much should you shift things by? The geometric argmax will depend on the additive constant.
If arithmetic and geometric means are so good, why not the harmonic mean? https://en.wikipedia.org/wiki/Pythagorean_means. What would a “harmonic rationality” look like?
I can answer this now!
Expected Utility, Geometric Utility, and Other Equivalent Representations
It turns out there are a large family of expectations we can use to build utility functions, including the arithmetic expectation E, the geometric expectation G, and the harmonic expectation H, and they’re all equivalent models of VNM rationality! And we need something beyond that family like Scott’s G[E[U]] to formalize geometric rationality.
Thank you for linking to these different families of means! The quasi-arithmetic mean turned out to be exactly what I needed for this result.
Very interesting! I’m excited to read your post.
Also here is a nice family that parametrizes these different kinds of average (https://m.youtube.com/watch?v=3r1t9Pf1Ffk)
Actually maybe this family is more relevant:
https://en.wikipedia.org/wiki/Generalized_mean, where the geometric mean is the limit as we approach zero.
The “harmonic integral” would be the inverse of integral of the inverse of a function—https://math.stackexchange.com/questions/2408012/harmonic-integral
Some results related to logarithmic utility and stock market leverage (I derived these after reading your previous post, but I think it fits better here):
Tl;dr: We can derive the optimal stock market leverage for an agent with utility logarithmic in money. We can also back-derive a utility function from any constant leverage[1], giving us a nice class of utility functions with different levels of risk-aversion. Logarithmic utility is recovered a special case, and has additional nice properties which the others may or may not have.
For an agent investing in a stock whose “instantaneous” price movements are i.i.d. with finite moments:
Suppose, for simplicity, that the agent’s utility function is over the amount of money they have in the next timestep. (As opposed to more realistic cases like “amount they have 20 years from now”.)
If U(x)=ln(x), then:
The optimal leverage for the agent to take is given by the formula L=m/(2s2), where m=E[returnPerTimestep−riskFreeReturnPerTimestep] and s is the standard deviation of the same. Derivation here. By my calculations, this implies a leverage of about 1.8 on the S&P 500.
What if we instead suppose the agent prefers some constant leverage L=m/(2cs2), and try to infer it’s utility function?
The relevant differential equation is x U′′(x)=−c U′(x)
This is solved by U(x)=1−x1−c for c≠1 and U(x)=ln(x) for c=1. You can play with the solutions here.
Now suppose instead that the agent’s utility function is “logarithmic withdrawals, time-discounted exponentially” -- U=∫∞t=0ln(w(t))eγt, where w(t) is the absolute[2] rate of withdrawal at time t. It turns out that optimal leverage is still constant, and is still given by the same formula L=m/(2s2). Furthermore, the optimal rate of withdrawal is a constant w(t)=1−γ, regardless of what happens.
Things probably don’t work out as cleanly for the non-logarithmic case.
[Disclaimer: This is not investment advice.]
Caveats:
1. This assumption of constant leverage is pretty arbitrary, so there’s no normative or descriptive force to the class of utility functions we derive from it
2. We have to make an unrealistic assumption that the utility function is over $$ at the next timestep, rather than further in the future. In the log case, these kind of assumptions tend to not change anything, but I’m not sure whether the general case is as clean.
i.e. in dollars, not percents
extreme nit, you probably meant for this be lowercase. I love this series!
I was wondering if exp∘H is anything. I don’t recognize 1Πkppkk, though.
it’s not intuitive to me when it’s reasonable to apply geometric rationality in an arbitrary context.
e.g. if i offered you a coin flip where i give you $0.01 with p=50%, and $100 with q=50%, i get G = √.01√100 = $1, which like, obviously you would go bankrupt really fast valuing things this way.
in kelly logic, i’m instead supposed to take the geometric average of my entire wealth in each scenario, so if i start with $1000, I’m supposed to take √1000.01√1100 = $1048.81, which does the nice, intuitive thing of penalizing me a little vs. linear expectation for the added volatility.
but… what’s the actual rule for knowing the first approach is wrong?
Another way of looking at this question: Arithmetic rationality is shift invariant, so you don’t have to know your total balance to calculate expected values of bets. Whereas for geometric rationality, you need to know where the zero point is, since it’s not shift invariant.
I think the rule is “you maximize your bank account, not the addition to it”. I.e. your value of deals depends on how many you already have.
Very minor, but shouldn’t this read ”P is a probability distribution on X” not Y?