abramdemski comments on Never Go Full Kelly

abramdemski Feb 28, 2021, 9:32 PM
2 points
This (to be clear) is not fractional Kelly, where I think we’re talking about a situation where the fraction is constant.

In the same way that “the Kelly strategy” in practice refers to betting a variable fraction of your wealth (even if the simple scenarios used to illustrate/derive the formula involve the same bet repeatedly, so the Kelly strategy is one which implies betting a fixed fraction of wealth), I think it’s perfectly sensible to use “fractional Kelly” to describe a strategy which takes a variable fraction of the Kelly bet, using some formula to determine the fraction (even if the argument we use to establish the formula is one where a constant Kelly fraction is optimal).

What I would take issue with would be an argument for fractional Kelly which assumed we should use a constant Kelly fraction (as I said, “tying the agent’s hands” by only looking at strategies where some constant Kelly fraction is chosen). Because then it’s not clear whether some fractional-Kelly is the best strategy for the described scenario; it’s only clear that you’ve found some formula for which fractional-Kelly is best in a scenario, given that you’re using some fractional Kelly.

Which was one of my concerns about what might be going on with the first argument.

The result that “uncertainty ⇒ go sub-Kelly” is robust to different models of uncertainty.

I find myself really wishing that you’d use slightly more Bayesian terminology. Kelly betting is already a rule for betting under uncertainty. You’re specifically saying that meta-uncertainty implies sub-kelly. (Or parameter uncertainty, or whatever you want to call it.)

I’m trying to find the right Bayesian way to express this, without saying the word “True probability”.

I appreciate the effort :)

So the graph shows what happens if we take our uncertainty and keep it as-is, not updating on data, as we continue to update?

Yes. Think of it as having a series of bets on different events with the same uncertainty each time.

Right… so in this case, it pretty strongly seems to me like the usual argument for Kelly applies. If you have a series of different bets in which you have the same meta-uncertainty, either your meta-uncertainty is calibrated, in which case your probability estimates will be calibrated, so the Kelly argument works as usual, or your meta-uncertainty is uncalibrated, in which case I just go meta on my earlier objections: why aren’t we updating our meta-uncertainty? I’m fine with assuming repeated different bets (from different reference classes) with the same parameter uncertainty being applied to all of them so long as it’s apparently sensible to apply the same meta-uncertainty to all of them. But systematic errors in your parameter uncertainty (such that you can look at a calibration graph and see the problem) should trigger an update in the general priors you’re using.

Here I am considering ∫ (notice the Kelly fraction depending on $^p$ inside the utility but not outside). “What is my expected utility, if I bet according to Kelly given my estimate”. (Ans: Not Full Kelly)

I think you are talking about the scenario ∫? (Ans: Full Kelly)

(Sorry, had trouble copying the formulae on greaterwrong)

I think what you’re pointing to here is very much like the difference between unbiased estimators and bayes-optimal estimators, right? Frequentists argue that unbiased estimators are better, because given any value of the true parameter, an unbiased estimator is in some sense doing a better job of approximating the right answer. Bayesians argue that Bayesian estimators are better, because of the bias-variance trade-off, and because you expect the Bayesian estimator to be more accurate in expectation (the whole point of accounting for the prior is to be more accurate in more typical situations).

I think the Bayesians pretty decisively win that particular argument; as an agent with a subjective perspective, you’re better off doing what’s best from within that subjective perspective. The Frequentist concept is optimizing based on a God’s-eye view, where we already know $p$ . In this case, it leads us astray. The God’s-eye view just isn’t the perspective from which a situated agent should optimize.

Similarly, I think it’s just not right to optimize the formula you give, rather than the one you attribute to me. If I have parameter uncertainty, then my notion of the expected value of using fractional Kelly is going to come from sampling from my parameter uncertainty, and checking what the expected payoffs are for each sample.

But then, as you know, that would just select a Kelly fraction of 1.

So if that formula describes your reasoning, I think you really are making the “true probability” mistake, and that’s why you’re struggling to put it in terms that are less objectionable from the Bayesian perspective. (Which, again, I don’t think is always right, but which I think is right in this case.)

(FYI, I’m not really arguing against fractional Kelly; full Kelly really does seem too high in some sense. I just don’t think this particular argument for fractional Kelly makes sense.)

Consider a scenario where we’re predicting a lot of (different) sports events. We could both be perfectly calibrated (what you say happens 20% of the time happens 20% of the time) etc, but I could be more “uncertain” with my predictions. If my prediction is always 50-50 I am calibrated, but I really shouldn’t be betting. This is about adjusting your strategy for this uncertainty.

I think what’s going on in this example is that you’re setting it up so that I know strictly more about sports than you. You aren’t willing to bet, because anything you know about the situation, I know better. In terms of your post, this is your second argument in favor of Kelly. And I think it’s the explanation here. I don’t think your meta-uncertainty has much to do with it.

Particularly if, as you posit, you’re quite confident that 50-50 is calibrated. You have no parameter uncertainty: your model is that of a fair coin, and you’re confident it’s the best model in the coin-flip model class.

BAYESIAN: Right… look, when I accepted the original Kelly argument, I wasn’t really imagining this circumstance where we face the exact same bet over and over. Rather, I was imagining I face lots of different situations. So long as my probabilities are calibrated, the long-run frequency argument works out the same way. Kelly looks optimal. So what’s your beef with me going “full Kelly” on those estimates?

No, my view were always closer to BAYESIAN here. I think we’re looking at a variety of different bets but where my probabilities are calibrated but uncertain. Being calibrated isn’t the same as being right. I have always assumed here that you are calibrated.

Then you concede the major assumption of BAYESIAN’s argument here! Under the calibration assumption, we can show that the long-run performance of Kelly is optimal (in the peculiar sense of optimality usually applied to Kelly, that is).

I’m curious how you would try and apply something like your formula to the mixed-bet case (ie, a case where you don’t have the same meta-uncertainty each time).

The strawman of your argument (which I’m struggling to understand where you differ) is. “A Bayesian with log-utility is repeatedly offered bets (mechanism for choosing bets unclear) against an unfair coin. His prior is that the coin comes up heads is uniform [0,1]. He should bet Full Kelly with p = 1⁄2 (or slightly less than Full Kelly once he’s updated for the odds he’s offered)”. I don’t think he should take any bets. (I’m guessing you would say that he would update his strategy each time to the point where he no longer takes any bets—but what would he do the first time? Would he take the bet?)

Here’s how I would fix this strawman. Note that the fixed strawman is still straw in the sense that I’m not actually arguing for full Kelly, I’m just trying to figure out your argument against it.

“A Bayesian with log-utility is repeatedly offered bets (coming from a rich, complex environment which I’m making no assumptions about, not even computability). His probabilities are, however, calibrated. Then full Kelly will be optimal.”

Probably there are a few different ways to mathify what I mean by “optimal” in this argument. Here are some observations/conjectures:
- Full Kelly optimizes the expected utility of this agent, obviously. So if the agent really has log utility, and really is a Bayesian, clearly it’ll go full Kelly.
- After enough bets, since we’re calibrated, we can assume that the frequency of success for $p = x$ bets will closely match $x$ . So we can make the usual argument that full Kelly will be very close to optimal: ** Fractional Kelly, or other modified Kelly formulas, will make less money. ** In general, any other strategy will make less money in the long run, under the assumption that long-run frequencies match probabilities—so long as that strategy does not contain further information about the world.
(For example, in your example where you have an ignorant but calibrated 50-50 model, maybe the true world is “yes on even-numbered dates, no on odd”. A strategy based on this even-odd info could outperform full Kelly, obviously. The claim is that so long as you’re not doing something like that, full Kelly will be approximately best.)

I think there’s something which I’ve not made clear but I’m not 100% I know we’ve found what it is yet.

My current estimate is that this is 100% about the frequentist Gods-eye-view way of arguing, where you evaluate the optimality of something by supposing a “true probability” and thinking about how well different strategies do as a function of that.

If so, I’ll be curious to hear your defense of the gods-eye perspective in this case.

One thing I want to make clear is that I think there’s something wrong with your argument on consequentialist grounds.

Or maybe the graph is of a single step of Kelly investment, showing expected log returns? But then wouldn’t Kelly be optimal, given that Kelly maximizes log-wealth in expectation, and in this scenario the estimate $^p$ is going to be right on average, when we sample from the prior?

Yeah—the latter—I will edit this to make it clearer. This is “expected utility” for one-period. (Which is equivalent to growth rate). I just took the chart from their paper and didn’t want to edit it. (Although that would have made things clearer. I think I’ll just generate the graph myself).

Looking at the bit I’ve emphasised. No! This is the point.

I want to emphasize that I also think there’s something consequentialistly weird about your position. As non-Bayesian as some arguments for Kelly are, we can fit the Kelly criterion with Bayes, by supposing logarithmic utility. So a consequentialist can see those arguments as just indirect ways of arguing for logarithmic utility.

Not so with your argument here. If we asses a gamble as having probability $p$ , then what could our model uncertainty have to do with anything? Model uncertainty can decrease our confidence that expected events will happen, but $p$ already prices that in. Model uncertainty also changes how we’ll reason later, since we’ll update on the results here (and wouldn’t otherwise do so). But, that doesn’t matter until later.

We’re saying: “Event $A$ might happen, with probability $p$ ; event $B$ might happen, with probability $1 - p$ .” Our model uncertainty grants more nuance to this model by allowing us to update it on receiving more information; but in the absence of such an update, it cannot possibly be relevant to the consequences of our strategies in events $A$ and $B$ . Unless there’s some funny updateless stuff going on, which you’re clearly not supposing.

From a consequentialist perspective, then, it seems we’re forced to evaluate the expected utility in the same way whether we have meta-uncertainty or not.