I almost like what this post is trying to do, except that Kelly isn’t just about repeated bets. It’s about multiplicative returns and independent bets. If the returns from your bets add (rather than multiply), then Kelly isn’t optimal. This is the case, for instance, for many high-frequency traders—the opportunities they exploit have limited capacity, so if they had twice as much money, they would not actually be able to bet twice as much.
The logarithm in the “maximize expected log wealth” formulation is a reminder of that. If returns are multiplicative and bets are independent, then the long run return will be the product of individual returns, and the log of long-run return will be the sum of individual log returns. That’s a sum of independent random variables, so we apply the central limit theorem, and find that long-run return is roughly e^((number of periods)*(average expected log return)). To maximize that, we maximize expected log return each timestep, which is the Kelly rule.
The logarithm in the “maximize expected log wealth” formulation is a reminder of that. If returns are multiplicative and bets are independent, then the long run return will be the product of individual returns, and the log of long-run return will be the sum of individual log returns. That’s a sum of independent random variables, so we apply the central limit theorem, and find that long-run return is roughly e^((number of periods)*(average expected log return)). To maximize that, we maximize expected log return each timestep, which is the Kelly rule.
Wait, so, what’s actually the argument?
It looks to me like the argument is “returns are multiplicative, so the long-run behavior is log-normal. But we like normals better than log-normals, so we take a log. Now, since we took a log, when we maximize we’ll find that we’re maximizing log-wealth instead of wealth.”
But what if I like log-normals fine, so I don’t transform things to get a regular normal distribution? Then when I maximize, I’m maximizing raw money rather than log money.
I don’t see a justification of the Kelly formula here.
The central limit theorem is used here to say “our long-run wealth will converge to e^((number of periods)*(average expected log return)), modulo error bars, with probability 1”. So, with probability 1, that’s the wealth we get (within error), and maximizing modal/median/any-fixed-quantile wealth will all result in the Kelly rule.
Under ordinary conditions, it’s pretty safe to argue “such and such with probability 1, therefore, it’s safe to pretend such-and-such”. But this happens to be a case where doing so makes us greatly diverge from the Bayesian analysis—ignoring the “most important” worlds from a pure expectation-maximization perspective (IE the ones where we repeatedly win bets, amassing huge sums).
So I’m very against sweeping that particular part of the reasoning under the rug. It’s a reasonable argument, it’s just one that imho should come equipped with big warning lights saying “NOTE: THIS IS A SEVERELY NON-BAYESIAN STEP IN THE REASONING. DO NOT CONFUSE THIS WITH EXPECTATION MAXIMIZATION.”
Instead, you simply said:
To maximize that, we maximize expected log return each timestep, which is the Kelly rule.
which, to my eye, doesn’t provide any warning to the reader. I just really think this kind of argument should be explicit about maximizing modal/median/any-fixed-quantile rather than the more common expectation maximization. Because people should be aware if one of their ideas about rational agency is based on mode/median/quantile maximization rather than expectation maximization.
So, with probability 1, that’s the wealth we get (within error), and maximizing modal/median/any-fixed-quantile wealth will all result in the Kelly rule.
Sorry, but I’m pulling out my “wait, what’s actually the claim here?” guns again.
Kelly maximizes a kind of pseudo-mode. For example, for a sequence of two bets on coin flips, Kelly optimizes the world where you win one and lose one, which is the mode. However, for three such bets, Kelly optimizes the world where you win 1.5 and lose 1.5, which isn’t even physically possible.
At first I thought there would be an obvious sense in which this is approximately mode-optimizing, getting closer and closer to mode-optimal in the limit. And maybe so. But it’s not obvious.
The pseudo-mode is never more than 1 outcome away from the true mode. However, one bet can make a lot of difference, and can make more difference if we have more rounds to accumulate money. So certainly we can’t say that Kelly’s maximization problem (I mean the maximization of wealth in the pseudo-mode world) becomes epsilon close to true mode-optimization, in terms of numerical measurement of the quality of a given strategy.
I’m not even sure that the mode-optimizing strategy is a fixed-fraction strategy like Kelly. Maybe a mode-optimizing strategy does some clever things in worlds where it starts winning unusually much, to cluster those worlds together and make their winnings into a mode that’s much better than the mode of Kelly.
our long-run wealth will converge to e^((number of periods)*(average expected log return)), modulo error bars
If the error bars became epsilon-close, then the argument would obviously work fine: the mode/median/quantile would all be very close to the same number, and this number would itself be very close to the pseudo-mode Kelly optimizes. So then Kelly would clearly come very close to optimizing all these numbers.
If the error bars became epsilon-close in ratio instead of in absolute difference, we could get a weaker but still nice guarantee: Kelly might fall short of mode-optimal by millions of dollars, but only in situations where “millions” is negligible compared to total wealth. This is comforting if we have diminishing returns in money.
But because any one bet can change wealth by a non-vanishing fraction (in general, and in particular when following Kelly), we know neither of those bounds can hold.
So in what sense, if any, does Kelly maximize mode/median/quantile wealth?
I’m suspecting that it may only maximize approximate mode/median/quantile, rather than approximately maximizing mode/median/quantile.
Just think of the whole thing on a log scale. The error bars become epsilon close in ratio on a log scale. There’s some terms and conditions to that approximation—average expected log return must be nonzero, for instance. But it shouldn’t be terribly restrictive.
If your immediate thought is “but why consider things on a log scale in the first place?”, then remember that we’re talking about mode/order statistics, so monotonic transformations are totally fine.
(Really, though, if we want to be precise… the exact property which makes Kelly interesting is that if you take the Kelly strategy and any other strategy, and compare them to each other, then Kelly wins with probability 1 in the long run. That’s the property which tells us that Kelly should show up in evolved systems. We can get that conclusion directly from the central limit theorem argument: as long as the “average expected log return” term grows like O(n), and the error term grows like O(sqrt(n)) or slower, we get the result. In order for a non-Kelly strategy to beat Kelly in the long run with greater-than-0 probability, it would somehow have to grow the error term by O(n).)
If your immediate thought is “but why consider things on a log scale in the first place?”, then remember that we’re talking about mode/order statistics, so monotonic transformations are totally fine.
Riiight, but, “totally fine” doesn’t here mean “Kelly approximately maximizes the mode as opposed to maximizing an approximate mode”, does it?
I have approximately two problems with this:
Bounding the ratio of log wealth compared to a true mode-maximizer would be reassuring if my utility was doubly logarithmic. But if it’s approximately logarithmic, this is little comfort.
But are we even bounding the ratio of log wealth to a true mode-maximizer? As I mentioned, I’m not sure a mode-maximizer is even a constant-fraction strategy.
Actually, you’re right, I goofed. Monotonic increasing transformation respects median or order statistics, so e.g. max median F(u) = F(max median u) (since F commutes with both max and median), but mode will have an additional term contributed by any nonlinear transformation of a continuous distribution. (It will still work for discrete distributions—i.e. max mode F(u) = F(max mode u) for u discrete, and in that case F doesn’t even have to be monotonic.)
So I guess the argument for median is roughly: we have some true optimum policy θ∗ and Kelly policy θK, and medianP[u|θK]F(u)≈medianP[u|θ∗]F(u), which implies medianP[u|θK]u≈medianP[u|θ∗]u as long as F is continuous and strictly increasing.
I almost like what this post is trying to do, except that Kelly isn’t just about repeated bets. It’s about multiplicative returns and independent bets. If the returns from your bets add (rather than multiply), then Kelly isn’t optimal. This is the case, for instance, for many high-frequency traders—the opportunities they exploit have limited capacity, so if they had twice as much money, they would not actually be able to bet twice as much.
The logarithm in the “maximize expected log wealth” formulation is a reminder of that. If returns are multiplicative and bets are independent, then the long run return will be the product of individual returns, and the log of long-run return will be the sum of individual log returns. That’s a sum of independent random variables, so we apply the central limit theorem, and find that long-run return is roughly e^((number of periods)*(average expected log return)). To maximize that, we maximize expected log return each timestep, which is the Kelly rule.
Wait, so, what’s actually the argument?
It looks to me like the argument is “returns are multiplicative, so the long-run behavior is log-normal. But we like normals better than log-normals, so we take a log. Now, since we took a log, when we maximize we’ll find that we’re maximizing log-wealth instead of wealth.”
But what if I like log-normals fine, so I don’t transform things to get a regular normal distribution? Then when I maximize, I’m maximizing raw money rather than log money.
I don’t see a justification of the Kelly formula here.
The central limit theorem is used here to say “our long-run wealth will converge to e^((number of periods)*(average expected log return)), modulo error bars, with probability 1”. So, with probability 1, that’s the wealth we get (within error), and maximizing modal/median/any-fixed-quantile wealth will all result in the Kelly rule.
Under ordinary conditions, it’s pretty safe to argue “such and such with probability 1, therefore, it’s safe to pretend such-and-such”. But this happens to be a case where doing so makes us greatly diverge from the Bayesian analysis—ignoring the “most important” worlds from a pure expectation-maximization perspective (IE the ones where we repeatedly win bets, amassing huge sums).
So I’m very against sweeping that particular part of the reasoning under the rug. It’s a reasonable argument, it’s just one that imho should come equipped with big warning lights saying “NOTE: THIS IS A SEVERELY NON-BAYESIAN STEP IN THE REASONING. DO NOT CONFUSE THIS WITH EXPECTATION MAXIMIZATION.”
Instead, you simply said:
which, to my eye, doesn’t provide any warning to the reader. I just really think this kind of argument should be explicit about maximizing modal/median/any-fixed-quantile rather than the more common expectation maximization. Because people should be aware if one of their ideas about rational agency is based on mode/median/quantile maximization rather than expectation maximization.
Ok, I buy that.
Sorry, but I’m pulling out my “wait, what’s actually the claim here?” guns again.
Kelly maximizes a kind of pseudo-mode. For example, for a sequence of two bets on coin flips, Kelly optimizes the world where you win one and lose one, which is the mode. However, for three such bets, Kelly optimizes the world where you win 1.5 and lose 1.5, which isn’t even physically possible.
At first I thought there would be an obvious sense in which this is approximately mode-optimizing, getting closer and closer to mode-optimal in the limit. And maybe so. But it’s not obvious.
The pseudo-mode is never more than 1 outcome away from the true mode. However, one bet can make a lot of difference, and can make more difference if we have more rounds to accumulate money. So certainly we can’t say that Kelly’s maximization problem (I mean the maximization of wealth in the pseudo-mode world) becomes epsilon close to true mode-optimization, in terms of numerical measurement of the quality of a given strategy.
I’m not even sure that the mode-optimizing strategy is a fixed-fraction strategy like Kelly. Maybe a mode-optimizing strategy does some clever things in worlds where it starts winning unusually much, to cluster those worlds together and make their winnings into a mode that’s much better than the mode of Kelly.
If the error bars became epsilon-close, then the argument would obviously work fine: the mode/median/quantile would all be very close to the same number, and this number would itself be very close to the pseudo-mode Kelly optimizes. So then Kelly would clearly come very close to optimizing all these numbers.
If the error bars became epsilon-close in ratio instead of in absolute difference, we could get a weaker but still nice guarantee: Kelly might fall short of mode-optimal by millions of dollars, but only in situations where “millions” is negligible compared to total wealth. This is comforting if we have diminishing returns in money.
But because any one bet can change wealth by a non-vanishing fraction (in general, and in particular when following Kelly), we know neither of those bounds can hold.
So in what sense, if any, does Kelly maximize mode/median/quantile wealth?
I’m suspecting that it may only maximize approximate mode/median/quantile, rather than approximately maximizing mode/median/quantile.
Just think of the whole thing on a log scale. The error bars become epsilon close in ratio on a log scale. There’s some terms and conditions to that approximation—average expected log return must be nonzero, for instance. But it shouldn’t be terribly restrictive.
If your immediate thought is “but why consider things on a log scale in the first place?”, then remember that we’re talking about mode/order statistics, so monotonic transformations are totally fine.
(Really, though, if we want to be precise… the exact property which makes Kelly interesting is that if you take the Kelly strategy and any other strategy, and compare them to each other, then Kelly wins with probability 1 in the long run. That’s the property which tells us that Kelly should show up in evolved systems. We can get that conclusion directly from the central limit theorem argument: as long as the “average expected log return” term grows like O(n), and the error term grows like O(sqrt(n)) or slower, we get the result. In order for a non-Kelly strategy to beat Kelly in the long run with greater-than-0 probability, it would somehow have to grow the error term by O(n).)
Riiight, but, “totally fine” doesn’t here mean “Kelly approximately maximizes the mode as opposed to maximizing an approximate mode”, does it?
I have approximately two problems with this:
Bounding the ratio of log wealth compared to a true mode-maximizer would be reassuring if my utility was doubly logarithmic. But if it’s approximately logarithmic, this is little comfort.
But are we even bounding the ratio of log wealth to a true mode-maximizer? As I mentioned, I’m not sure a mode-maximizer is even a constant-fraction strategy.
Sorry if I’m being dense, here.
Actually, you’re right, I goofed. Monotonic increasing transformation respects median or order statistics, so e.g. max median F(u) = F(max median u) (since F commutes with both max and median), but mode will have an additional term contributed by any nonlinear transformation of a continuous distribution. (It will still work for discrete distributions—i.e. max mode F(u) = F(max mode u) for u discrete, and in that case F doesn’t even have to be monotonic.)
So I guess the argument for median is roughly: we have some true optimum policy θ∗ and Kelly policy θK, and medianP[u|θK]F(u)≈medianP[u|θ∗]F(u), which implies medianP[u|θK]u≈medianP[u|θ∗]u as long as F is continuous and strictly increasing.
Yeah—I agree, that was what I was trying to get at. I tried to address (the narrower point) here:
But I agree giving some examples of where it doesn’t apply would probably have been helpful to demonstrate when it is useful