Kelly is (just) about logarithmic utility

abramdemskiMar 1, 2021, 8:02 PM

99 points

Kelly Criterion Updated Beliefs (examples thereof)World Modeling Betting

This post is a response to SimonM’s post, Kelly isn’t (just) about logarithmic utility. It’s an edited and extended version of some of my comments there.

To summarize the whole idea of this post: I’m going to argue that any argument in favor of the Kelly formula has to go through an implication that your utility is logarithmic in money, at some point. If it seems not to, it’s either:

mistaken
cleverly hiding the implication
some mind-blowing argument I haven’t seen before.

Actually, the post I’m responding to already mentioned one argument in this third category, which I’ll mention later. But for the most part I think the point still stands: the best reasons to suppose Kelly is a good heuristic go through arguing logarithmic utility.

The main point of this post is to complain about bad arguments for Kelly—something which I apparently enjoy doing rather a lot. Take that as an attention-conservation warning.

The rest of this post will consider various arguments in favor of the Kelly criterion (either as a decent rule of thumb, or, as the iron law of investment). Each section considers one argument, with a section title hopefully descriptive of the argument considered.

1: It’s About Repeated Bets

This argument goes something like: “If you were to make just one bet, the right thing to do would be to maximize expected value; but for repeated bets, if you bet everything, you’ll lose all your money quickly. The Kelly strategy adjusts for this.”

A real example of this argument, from the comments:

Kelly maximizes expected geometric growth rate. Therefore over enough bets Kelly maximizes expected, i.e. mean, wealth, not merely median wealth.

This just doesn’t work out. Maximizing geometric growth rate is not the same as maximizing mean value. It turns out Kelly favors the first at a severe cost to the second.

Suppose you’d just want to maximize expected money in a single-bet case.

A Bayesian wants to maximize $E [u (S \cdot x)]$ , where $x$ is your starting money and $S$ is a random variable for the payoff-per-dollar of your strategy. In a two-step scenario, the Bayesian wants to maximize $E [u (S_{1} \cdot S_{2} \cdot x)]$ . And so on.

If your preferred one-step strategy is one which maximizes expected money, this means $u (x) = x$ for you. But this allows us to push the expectation inwards. Look at the two-step case: $E [u (S_{1} \cdot S_{2} \cdot x)]$ $= E [S_{2} \cdot S_{1} \cdot x]$ $= E [S_{1}] \cdot E [S_{2}] \cdot x$ (the last step holds because we assume the random variables are independent). So we maximize the total expected money by maximizing the expected money of $S_{1}$ and $S_{2}$ individually.

Similarly for any number of steps: you just maximize the expectation in each step individually.

Note that the resulting behavior will be crazy. If you had a 51% chance of winning a double-or-nothing bet, you’d want to bet all the money you have. By your own probability estimates, you stand a 49% chance of losing everything. From a standard-human perspective, this looks quite financially irresponsible. It gets even worse for repeated bets. The strategy is basically “bet all your money at every opportunity, until you lose everything.” Losing everything would become a virtual certainty after only a few bets—but the expectation maximizer doesn’t care. The expectation maximizer happily trades away the majority of worlds, in return for amassing exponentially huge sums in the lucky world where they keep winning.

(“And that’s the right thing to do, for their values!” says the one.

“Is it, though?” says the other. “That’s putting the cart before the horse. In Bayesian utility theory, you first figure out what the preferences are, and then you figure out a utility function to represent those preferences. You shouldn’t just go from caring about money to naively maximizing expected money.”

“True,” says the one. “But there is a set of preferences which someone could have, which would imply that utility function.”)

So, my conclusion? If you don’t prefer maximizing expected money for repeated bets (and you probably don’t), then you must not prefer it for a single-shot bet, either.

Nothing about expected value maximization breaks when we apply it to multiple decisions across time. The culprit is the utility function. If the Kelly criterion is appealing, it must be because your utility is approximately logarithmic.

(By the way, this section shouldn’t be confused for arguing against every possible argument for Kelly that involves repeated bets. The current section is only arguing against the super naive argument which claims Kelly is some kind of adjustment to expectation-maximization to handle the repeated-bets case.)

2: It’s About Optimizing Typical Outcomes

I won’t fully go through the standard derivation of Kelly, but it goes something like this. First, we suppose a specific type of investment opportunity will pay out with probability $p$ . Then, we suppose we face similar opportunities many times. We note that the fraction of successes must be very close to $p$ . Then, under that assumption, we do some math to figure out what the optimal investment strategy is.

For example, suppose we play a game: you start with $100, and I start with $ $\infty$ . We’ll make bets on a fair coin; whatever you wager, I’ll multiply it by 3 if the coin comes up heads. However, if the coin comes up tails, I’ll take it all. We will flip exactly 100 times. How will you decide how much to bet each time? The Kelly derivation is saying: choose your optimal strategy by assuming there will be exactly 50 heads and 50 tails. This won’t be exactly true, but it’s probably close; if we flipped even more times, then it would be more certain that we’d be very close to that ratio.

The main point I want to make about this is that it’s not much of an argument for using the Kelly formula. Just because most worlds look very close to the 50-50 world, doesn’t mean planning optimally for the 50-50 world is close to optimal in general.

Suppose you consider betting half your money every time, in our game. The Kelly evaluation strategy goes like this: when you win, you double your money (because you keep ¹⁄₂, and put ¹⁄₂ on the line; I triple that sum, to ³⁄₂; combining that with the ¹⁄₂ you saved, you’ve doubled your money). When you lose, you halve your money. Since you’ll win and lose equally many times, you’d break even with this strategy, keeping $100; so, it’s no better than keeping all your money and never betting a cent. (The Kelly recommendation for this 1/4th; ¹⁄₂ is far too much.)

But consider: 51-49 and 49-51 are both quite probable as well, almost as probable as the 50-50 outcome. In one case, you double your money one more time, and halve it one less time. So you’ll end with $400. In the other case, just the opposite, so you’ll end with $25.

Do these two possibilities cancel out, so that we can act like the 50-50 case is all that matters? Not to an expected-money maximizer; the average between $400 and $25 is $212.50; a significant gain over $100. So now it sounds like this strategy might not be so close to breaking even after all.

Generally speaking, although the ratio of success to failure will converge to $p$ , the absolute difference between the true number of successes and the number expected by the Kelly analysis won’t converge to zero. And the small deviations in ratio will continue to make large differences in value, like those above. So why should we care that the ratio converges?

Ok. It’s hard to justify taking only the single most probable world (like the 50-50 world) and planning for that one. But there are steelmen of the basic argument. As John Wentworth said:

maximizing modal/median/any-fixed-quantile wealth will all result in the Kelly rule

The discussion above can be thought of as maximizing the mode (choosing the strategy which maximizes the most probable amount of money we might get). John points out that we can choose many other notions of “typical outcome”, and get the same result. Just so long as we don’t optimize the mean (which gets us the expected-money strategy again), we end up with the Kelly strategy.

Optimizing for the mode/median/quantile is usually a significantly worse idea than optimizing expected utility. For example, optimizing for median utility just means ranking every possibility from worst to best (with a number of copies based on its probability), and judging how well we’re doing by looking at the possibility which ends up at the halfway point. This is perfectly consistent with a 49% chance of extreme failure; median-utility-optimization doesn’t care how bad the worst 49% is. This is really implausible, as a normative (or descriptive) theory of risk management.

The fixed-quantile-maximizer allows us to tweak this. We can look at the bottom 2% mark (ie an outcome close to the bottom of the list), so that we can’t be ignoring a terrible disaster that’s got almost 50% probability. But this is insensitive to really good outcomes vs merely moderately good ones, until they cross the 98% probability line. For example, if a task just inherently has a 10% chance of bad-as-it-can-be failure (which there’s nothing you can do about), the 2%-quantile-maximizer won’t optimize at all; any option will look equally bad to it.

If all of these choices are terrible in general, why should we find them at all plausible in the particular case of justifying the Kelly rule?

So no one should see the Kelly derivation and think “OK, Kelly maximizes long-run profits, great.”

Instead, I think the Kelly derivation and related arguments should be seen as much more indirect. We look at this behavior Kelly recommends, and we say to ourselves, “OK, this seems pretty reasonable.” And we look at the behavior which expected money-maximization recommends, and we say, “No, that looks entirely unreasonable.” And we conclude that our preferences must be closer to those of a Kelly agent than those of an expected-money maximizer.

In other words, we conclude that our utility is approximately logarithmic in money, rather than linear.

(A conclusion which is, by the way, very plausible on other grounds [Economic Growth and Subjective Well-Being: Reassessing the Easterlin Paradox. Betsey Stevenson and Justin Wolfers.].)

3: It’s About Time-Averaging Rather Than Ensemble-Averaging

A new approach to economic decision-making called Ergodicity Economics, primarily developed by Ole Peters, attempts to make a much more sophisticated argument similar to “Kelly is about repeated bets”. It is not simply the naive argument I dismissed in the first section. I think it’s much more interesting. But, ultimately, I think it’s not that convincing.

I won’t be able to explain the whole thing in this post, but one of the central ideas is time-averaging rather than ensemble-averaging. Ole Peters critiques Bayesians for averaging over possibilities. He states that ensemble averages are appropriate when a lot of things are happening in parallel, like insurance companies tabulating death rates to ensure their income is sufficient for what they’ll have to pay out. However, when you’re an individual, you only die once. When things happen sequentially, you should be taking the time-average.

Peters’ approach addresses many more things than just the Kelly formula—just to be clear. It’s just one particular case we can analyze. But, here’s roughly what Peters would do for that case. We can’t time-average our profits, since those can keep increasing boundlessly. (As we accumulate more money to bet with, we can make larger bets, so the average winnings could just go to infinity.) So we look at the ratio of our money from one round to the next. This, it turns out, we can time-average. And what strategy maximizes that time-average? Kelly, of course!

My problem with this is mainly that it seems very ad-hoc. I would be somewhat more impressed if someone could prove that there was a unique correct choice of what to maximize, rather than just creatively coming up with something that can be time-averaged, and then declaring that we should maximize that. This seems suspiciously close to just taking a logarithm without any justification.

Not only do we have to choose a function to time-average, we also have to select an appropriate way to turn our situation into an iterated game. This isn’t a difficulty in the Kelly case, but in principle, it’s another degree of freedom in the analysis, which makes the results feel more arbitrary. (If you’re a Bayesian who can represent your life as a big game tree where all the branches end in death, how would you abstract out isolated situations as infinitely-iterated games, in order to apply the Peters construction?)

4: It’s About Convergent Instrumental Goals

The basic idea of this argument is similar to the naive first argument we discussed: argue that repeated bets bring you closer and closer to logarithmic utility. Unlike the first attempt, we now grant that linear utility doesn’t work this way. But maybe linear utility is a very special case.

Suppose you need $5 to ride the bus. Nothing else is significant to you right now. We can think of your utility as 1u if you have $5 or more, and 0u otherwise.

Now suppose someone approaches you with a bet at the bus stop. It’s a double-or-nothing bet. You yourself are 50-50 on the outcome, so ordinarily, it wouldn’t be worth taking. In this case, however, the bet could save you: if you have $2.50 or more, the bet could give you a 50% chance at $5, so you could ride the bus!

So now your expected utility, as a function of money in your pocked at the beginning of the scenario, is actually a two-step function: 0u for less that $2.50, 0.5u from $2.50 to <$5, and 1u for $5 and up.

What’s important about this scenario is that the bet changed your expected value function. Mossin (who I’ll discuss more in a bit) calls this your derived utility function.

In the first section, I showed that this doesn’t happen for linear utility functions. If your utility function is linear, your derived utility function is also linear. Mossin calls functions with this property myopic, because they can make each decision as if it was their last.

Log utility is also myopic, just like linear utility: $E [log (S_{1} \cdot S_{2} \cdot x)]$ $= E [log (S_{1}) + log (S_{2}) + log (x)]$ $= E [log (S_{1})] + E [log (S_{2})] + E [log (x)]$ . Maximizing long-term log-money breaks down to maximizing the log-utility of each step.

If you know a little dynamical systems theory, you might be thinking: aha, we know these are fixed points, but is one of these points an attractor? Perhaps risk-averse functions which somewhat resemble logarithmic functions will have derived utility functions which are a bit closer to logarithmic, so that when we face many many bets, our derived utility function will become very close to logarithmic.

If true, this would be a significant vindication of the Kelly rule! Imagine that you’re a stock trader who plans to retire at a specific date. Your utility is some function of the amount of money you retire with. The above argument would say: your derived utility function is the result of many, many, bets. So, as long as your utility function meets some basic conditions (eg, isn’t linear), your derived utility function will be a close approximation of a logarithm!

Until I read SimonM’s post, I actually thought this was true. However, SimonM says the following:

“Optimal Multiperiod Portfolio Policies” (Mossin) shows that for a wide class of utilities, optimising utility of wealth at time t is equivalent to maximising utility at each time-step.

IE, Mossin shows that a lot of utility functions actually are myopic! Not all utility functions, by any means, but enough to break the hope that logarithmic utility is a strong attractor.

So, for a large class $^{1}$ of utility functions, the “Kelly is about repeated bets” argument fails just as hard as it did for the linear case.

This is really surprising!

So it appears we can’t argue that log utility is a convergent instrumental goal. It’s not true that a broad variety of agents will want to Kelly-bet in the short term in order to maximize utility in the long term. This seems like a pretty bad sign for SimonM’s argument that Kelly is about repeated bets. $^{2}$

If anyone thinks they can recover this argument, please let me know! It’s still possible that some class of functions has this property. It’s just that now we know we need to side-step a lot of functions, not just linear functions. So we won’t be able to push the argument through with weak assumptions, EG, “any risk-averse function implies approximately logarithmic derived utility”. However, it’s still possible that all of Mossin’s myopic functions are “unrealistic” in some way, so that we can still argue Kelly is an instrumentally convergent strategy for humans.

But I currently see no reason to suspect this.

5: It’s About Beating Everyone Else

At the beginning of this post, I mentioned that SimonM did give one result which neither seems mistaken, nor seems to be about logarithmic utility. Here’s what SimonM says:

“Competitive optimality”. Any other strategy can only beat Kelly at most ¹⁄₂ the time. (1/2 is optimal since the other strategy could be Kelly)

This is true because Kelly optimizes median utility. No other strategy can have higher median utility; so, given any other strategy, Kelly must be better at least half the time.

Humans have a pretty big competitive component to our preferences. People enjoy being the richest person they know. So, this could plausibly be relevant for someone’s betting strategy, and doesn’t require logarithmic utility.

I’ve also heard it said that a market will evolve to be dominated by Kelly bettors. I think this basically refers to the idea that in the long run, you can expect Kelly bettors to have higher wealth than anyone else with arbitrarily high probability (because Kelly maximizes any quantile, not just median). However, I was curious if Kelly comes out on top in a more literally evolutionary model. The Growth of Relative Wealth and the Kelly Criterion examines this question. I haven’t looked at it in-depth, but it appears the answer is “sometimes”.

Conclusion: To Kelly, Or Not To Kelly?

My experience writing this post has been a progressive realization that the argument for the Kelly criterion is actually much weaker than I thought. I expected to mainly look at arguments for Kelly and show how they have to go through an assumption tantamount to log-utility. Instead, I spend more time finding that the arguments were just not very good.

When I responded to ideas about optimizing mode/median/quantiles in the comment section to SimonM’s post, my objection was just “it’s important to point out that you’re optimizing mean/median/quantile, rather than the more usual expected-value”. But now I’m like: optimizing mohe/median/quantile is actually a pretty terrible principle, generally speaking! Why would we apply it here?
I had thought that some form of “instrumental convergence” argument would work, as discussed in section 4. But it appears not!

So before writing this post, my position was: Kelly is optimal in a non-Bayesian sense, which is peculiar, but seems oddly compelling. Within a Bayesian framework, we can “explain” this compellingness by supposing logarithmic utility. So it seems like the utility of money is roughly logarithmic for humans, which, anyway, is plausible on other grounds. Furthermore, risk-averse agents will have logarithmic expected values in practice, anyway, due to instrumental convergence. So it’s fair to say Kelly bets are approximately optimal for humans.

But now, I think: Kelly is optimal in a peculiar non-Bayesian sense, but it’s pretty terrible. $^{3}$ Furthermore, there’s no instrumental convergence to Kelly, as far as I can tell. So all I’m left with is: human utility appears to be approximately logarithmic in money, on other grounds.

Overall, this still suggests Kelly is a decent rule of thumb!

I certainly haven’t exhausted all the ways people have argued in favor of the Kelly criterion, either. If you think you know of an argument which isn’t addressed by any of my objections, let me know.

Footnotes

I should note that while SimonM says “a wide class”, Mossin instead says:

it will be shown that the only utility functions allowing myopic decision making are the logarithmic and power functions which we have encountered earlier

IE, Mossin seems to think of it as a narrow class. However, Mossin’s result is enough to block any approach I would have taken to proving some kind of convergence result. (I spend some time trying to prove a result while writing this, before I gave up and read Mossin.)

In case you’re curious, Mossin’s “power functions” are:

u (x) = \frac{1}{λ - 1} (μ + λ x)^{1 - 1 / λ}

Where $μ$ and $λ$ are some parameters which appear to be fixed by the surrounding context in the paper (not free), but I haven’t fully understood that part yet.

Mossin also discusses a broader class of weakly myopic functions. These utility functions aren’t quite the same as their derived functions, but I’m guessing they’re also going to be counterexamples to any attempted convergence result.

SimonM realizes that Mossin’s result poses a problem for his narrative, at least at a shallow level:

BUT HANG ON! I hear you say. Haven’t you just spent the last 5 paragraphs saying that Kelly is about repeated bets? If it all reduces to one period, why all the effort? The point is this: legible utilities need to handle the multi-period nature of the world. I have no (real) sense of what my utility function is, but I do know that I want my actions to be repeatable without risking ruin!

At first, I thought this was waffling and excuses; but on reflection, I entirely agree. As I said in section 2, I think the right argument for Kelly as a heuristic is the fairly indirect one: Kelly seems like a sane way of managing risk of ruin, so my preferences must be closer to logarithmic than (eg) linear.

I confess, although optimizing for mode/median/quantiles is not very good, I still find something interesting about the argument from section 2. The general principle “ignore extremely improbable extreme outcomes” seems like a hack, but it’s an interesting hack, since it blocks many philosophical problems (such as Pascal’s Wager). And, in this particular case, it seems oddly plausible: it intuitively seems like the expected-money-maximizer is doing something wrong, and a plausible analysis of that wrongness is that it happily trades away all its utility in increasingly many worlds, for a vanishing chance of happiness in tiny slivers of possibility-space. It would be nice to have solid principles which block this behavior. But mode/median/quantile maximization are not plausible as general principles.

Also, even though optimizing for mode/median/quantiles seem individually terrible, optimizing for them all at once is actually pretty good! My criticisms of the individual principles don’t apply when they’re all together. However, optimizing for all of them at once is not possible in general.

What links here?