I haven’t read the material extensively (I’ve skimmed it), but here’s what I think is wrong with the time-average-vs-ensemble-average argument and my attempt to steelman it.
It seems very plausible to me that you’re right about the question-begging nature of Peter’s version of the argument; it seems like by maximizing expected growth rate, you’re maximizing log wealth.
But I also think he’s trying to point at something real.
In the presentation where he uses the 1.5x/0.6x bet example, Peters shows how “expected utility over time” is an increasing line (this is the “ensemble average”—averaging across possibilities at each time), whereas the actual payout for any player looks like a straight downward line (in log-wealth) if we zoom out over enough iterations. There’s no funny business here—yes, he’s taking a log, but that’s just the best way of graphing the phenomenon. It’s still true that you lose almost surely if you keep playing this game longer and longer.
This is a real phenomenon. But, how do we formalize an alternative optimization criterion from it? How do we make decisions in a way which “aggregates over time rather than over ensemble”? It’s natural to try to formalize something in log-wealth space since that’s where we see a straight line, but as you said, that’s question-begging.
Well, a (fairly general) special case of log-wealth maximization is the Kelly criterion. How do people justify that? Wikipedia’s current “proof” section includes a heuristic argument which runs roughly as follows:
Imagine you’re placing bets in the same way a large number of times, N.
By the law of large numbers, the frequency of wins and losses approximately equals their probabilities.
Optimize total wealth at time N under the assumption that the frequencies equal the probabilities. You get the Kelly criterion.
Now, it’s easy to see this derivation and think “Ah, so the Kelly criterion optimizes your wealth after a large number of steps, whereas expected utility only looks one step ahead”. But, this is not at all the case. An expected money maximizer (EMM) thinking long-term will still take risky bets. Observe that (in the investment setting in which Kelly works) the EMM strategy for a single step doesn’t depend on the amount of money you have—you either put all your money in the best investment, or you keep all of your money because there are no good investments. Therefore, the payout of the EMM in a single step is some multiple C of the amount of money it begins that step with. Therefore, an EMM looking one step ahead just values its winnings at the end of the first step C more—but this doesn’t change its behavior, since multiplying everything by C doesn’t change what the max-expectation strategy will be. Similarly, two-step lookahead only modifies things by C2, and so on. So an EMM looking far ahead behaves just like one maximizing its holdings in the very next step.
The trick in the analysis is the way we replace a big sum over lots of possible ways things could go with a single “typical” outcome. This might initially seem like a mere computational convenience—after all, the vast vast majority of possible sequences have approximately the expected win/loss frequencies. Here, though, it makes all the difference, because it eliminates from consideration the worlds which have the highest weight in the EMM analysis—the worlds where things to really well and the EMM gets exponentially much money.
OK, so, is the derivation just a mistake?
I think many english-language justifications of the Kelly criterion or log-wealth maximization are misleading or outright wrong. I don’t think we can justify it as an analysis of the best long-term strategy, because the analysis rules out any sequence other than those with the most probable statistics, which isn’t a move motivated by long-term analysis. I don’t think we can even justify it as “time average rather than ensemble average” because we’re not time-averaging wealth. Indeed, the whole point is supposedly to deal with the non-ergodic cases; but non-ergodic systems don’t have unique time-averaged behavior!
However, I ultimately find something convincing about the analysis: namely, from an evolutionary perspective, we expect to eventually find that only (approximate) log-wealth maximizers remain in the market (with non-negligible funds).
This conclusion is perfectly compatible with expected utility theory as embodied by the VNM axioms et cetera. It’s an argument that market entities will tend to have utility=log(money), at least approximately, at least in common situations which we can expect strategies to be optimized for. More generally, there might be an argument that evolved organisms will tend to have utility=log(resources), for many notions of resources.
However, maybe Nassim Nicolas Taleb would rebuke us for this tepid and timid conclusion. In terms of pure utility theory, applying a log before taking an expectation is a distinction without a difference—we were allowed any utility function we wanted from the start, so requiring an arbitrary transform means nothing. For example, we can “solve” the St. Petersburg paradox by claiming our utility is the log of money—but we can then re-create the paradox by putting all the numbers in the game through an exponential function! So what’s the point? We should learn from our past mistakes, and choose a framework which won’t be prone to those same errors.
So, can we steelman the claims that expected utility theory is wrong? Can we find a decision procedure which is consistent with the Peters’ general idea, but isn’t just log-wealth maximization?
Well, let’s look again at the kelly-criterion analysis. Can we make that into a general-purpose decision procedure? Can we get it to produce results incompatible with VNM? If so, is the procedure at all plausible?
As I’ve already mentioned, there isn’t a clear way to apply the law-of-large-numbers trick in non-ergodic situations, because there is not a unique “typical” set of frequencies which emerges. Can we do anything to repair the situation, though?
I propose that we maximize median expected value. This gives a notion of “typical” which does not rely on an application of the law of large numbers, so it’s fine if the statistics of our sequence don’t converge to a single unique point. If they do, however, the median will evaluate things from that point. So, it’s a workable generalization of the principle behind Kelly betting.
The median also relates to something mentioned in the OP:
I’ve felt vaguely confused for a long time about why expected value/utility is the right way to evaluate decisions; it seems like I might be more strongly interested in something like “the 99th percentile outcome for the overall utility generated over my lifetime”.
The median is the 50th percentile, so there you go.
Maximizing the median indeed violates VNM:
It’s discontinuous. Small differences in probability can change the median outcome by a lot. Maybe this isn’t so bad—who really cares about continuity, anyway? Yeah, seemingly small differences in probability create “unjustified” large differences in perceived quality of a plan, but only in circumstances where outcomes are sparse enough that the median is not very “informed”.
It violates independence, in a more obviously concerning way. A median-maximizer doesn’t care about “outlier” outcomes. It’s indifferent between the following two plans, which seems utterly wrong:
A plan with 100% probability of getting you $100
A plan with 60% probability of getting you $100, and 40% probability of getting you killed.
Both of these concerns become negligible as we take a long-term view. The longer into the future we look, the more outcomes there will be, making the median more robust to shifting probabilities. Similarly, a median-maximizer is indifferent between the two options above, but if you consider the iterated game, it will strongly prefer the global strategy of always selecting the first option.
Still, I would certainly not prefer to optimize median value myself, or create AGI which optimizes median value. What if there’s a one-shot situation which is similar to the 40%-death example? I think I similarly don’t want to maximize the 99th percentile outcome, although this is less clearly terrible.
Can we give an evolutionary argument for median utility, as a generalization of the evolutionary argument for log utility? I don’t think so. The evolutionary argument relies on the law of large numbers, to say that we’ll almost surely end up in a world where log-maximizers prosper. There’s no similar argument that we almost surely end up in the “median world”.
So, all told:
I don’t think there’s a good argument against expectation-maximization here.
But I do think those who think there is should consider median-maximization, as it’s an alternative to expectation-maximization which is consistent with much of the discussion here.
I basically buy the argument that utility should be log of money.
I don’t think it’s right to describe the whole thing as “time-average vs ensemble-average”, and suspect some of the “derivations” are question-begging.
I do think there’s an evolutionary argument which can be understood from some of the derivations, however.
I now like the “time vs ensemble” description better. I was trying to understand everything coming from a Bayesian frame, but actually, all of these ideas are more frequentist.
In a Bayesian frame, it’s natural to think directly in terms of a decision rule. I didn’t think time-averaging was a good description because I didn’t see a way for an agent to directly replace ensemble average with time average, in order to make decisions:
Ensemble averaging is the natural response to decision-making under uncertainty; you’re averaging over different possibilities. When you try to time-average to get rid of your uncertainty, you have to ask “time average what?”—you don’t know what specific situation you’re in.
In general, the question of how to turn your current situation into a repeated sequence for the purpose of time-averaging analysis seems under-determined (even if you are certain about your present situation). Surely Peters doesn’t want us to use actual time in the analysis; in actual time, you end up dead and lose all your money, so the time-average analysis is trivial.
Even if you settle on a way to turn the situation into an iterated sequence, the necessary limit does not necessarily exist. This is also true of the possibility-average, of course (the St Petersburg Paradox being a classic example); but it seems easier to get failure in the time-avarage case, because you just need non-convergence; ie, you don’t need any unbounded stuff to happen.
However, all of these points are also true of frequentism:
Frequentist approaches start from the objective/external perspective rather than the agent’s internal uncertainty. They don’t want to define probability as the subjective viewpoint; they want probability to be defined as limiting frequencies if you repeated an experiment over and over again. The fact that you don’t have direct access to these is a natural consequence of you not having direct access to objective truth.
Even given direct access to objective truth, frequentist probabilities are still under-defined because of the reference class problem—what infinite sequence of experiments do you conceive of your experiment as part of?
And, again, once you select a sequence, there’s no guarantee that a limit exists. Frequentism has to solve this by postulating that limits exist for the kinds of reference classes we want to talk about.
So, I now think what Ole Peters is working on is frequentist decision theory. Previously, the frequentist/Bayesian debate was about statistics and science, but decision theory was predominantly Bayesian. Ole Peters is working out the natural theory of decision making which frequentists could/should have been pursuing. (So, in that sense, it’s much more than just a new argument for kelly betting.)
Describing frequentist-vs-Bayesian as time-averaging vs possibility-averaging (aka ensemble-averaging) seems perfectly appropriate.
So, on my understanding, Ole’s response to the three difficulties could be:
We first understand the optimal response to an objectively defined scenario; then, once we’ve done that, we can concern ourselves with the question of how to actually behave given our uncertainty about what situation we’re in. This is not trying to be a universal formula for rational decision making in the same way Bayesianism attempts to be; you might have to do some hard work to figure out enough about your situation in order to apply the theory.
And when we design general-purpose techniques, much like when we design statistical tests, our question should be whether given an objective scenario the decision-making technique does well—the same as frequentists wanting estimates to be unbiased. Bayesians want decisions and estimates to be optimal given our uncertainty instead.
As for how to turn your situation into an iterated game, Ole can borrow the frequentist response of not saying much about it.
As for the existence of a limit, Ole actually says quite a bit about how to fiddle with the math until you’re dealing with a quantity for which a limit exists. See his lecture notes. On page 24 (just before section 1.3) he talks briefly about finding an appropriate function of your wealth such that you can do the analysis. Then, section 2.7 says much more about this.
The general idea is that you have to choose an analysis which is appropriate to the dynamics. Additive dynamics call for additive analysis (examining the time-average of wealth). Multiplicative dynamics call for multiplicative analysis (examining the time-average of growth, as in kelly betting and similar settings). Other settings call for other functions. Multiplicative dynamics are common in financial theory because so much financial theory is about investment, but if we examine financial decisions for those living on income, then it has to be very different.
So, can we steelman the claims that expected utility theory is wrong? Can we find a decision procedure which is consistent with the Peters’ general idea, but isn’t just log-wealth maximization?
Yes. As I’ve pointed out before, a lot of these problems go away if you simply solve the actual problem instead of a pseudo-problem. Decision theory, and Bayesian decision theory, has no problem with multi-step processes, like POMDPs/MDPs—or at least, I have yet to see anyone explain what, if anything, of Peters/Taleb’s ‘criticisms’ of expected-value goes away if you actually solve the corresponding MDP. (Bellman did it better 70 years ago.)
FWIW, I remain pretty firmly in the expected-utility camp; but I’m quite interested in looking for cracks around the edges, and exploring possibilities.
I agree that there’s no inherent decision-theory issue with multi-step problems (except for the intricacies of tiling issues!).
However, the behavior of Bayesian agents with utility linear in money, on the Kelly-betting-style iterated investment game, for high number of iterations, seems viscerally wrong. I can respect treating it as a decision-theoretic counterexample, and looking for decision theories which don’t “make that mistake”. I’m interested in seeing what the proposals look like.
I haven’t read the material extensively (I’ve skimmed it), but here’s what I think is wrong with the time-average-vs-ensemble-average argument and my attempt to steelman it.
It seems very plausible to me that you’re right about the question-begging nature of Peter’s version of the argument; it seems like by maximizing expected growth rate, you’re maximizing log wealth.
But I also think he’s trying to point at something real.
In the presentation where he uses the 1.5x/0.6x bet example, Peters shows how “expected utility over time” is an increasing line (this is the “ensemble average”—averaging across possibilities at each time), whereas the actual payout for any player looks like a straight downward line (in log-wealth) if we zoom out over enough iterations. There’s no funny business here—yes, he’s taking a log, but that’s just the best way of graphing the phenomenon. It’s still true that you lose almost surely if you keep playing this game longer and longer.
This is a real phenomenon. But, how do we formalize an alternative optimization criterion from it? How do we make decisions in a way which “aggregates over time rather than over ensemble”? It’s natural to try to formalize something in log-wealth space since that’s where we see a straight line, but as you said, that’s question-begging.
Well, a (fairly general) special case of log-wealth maximization is the Kelly criterion. How do people justify that? Wikipedia’s current “proof” section includes a heuristic argument which runs roughly as follows:
Imagine you’re placing bets in the same way a large number of times, N.
By the law of large numbers, the frequency of wins and losses approximately equals their probabilities.
Optimize total wealth at time N under the assumption that the frequencies equal the probabilities. You get the Kelly criterion.
Now, it’s easy to see this derivation and think “Ah, so the Kelly criterion optimizes your wealth after a large number of steps, whereas expected utility only looks one step ahead”. But, this is not at all the case. An expected money maximizer (EMM) thinking long-term will still take risky bets. Observe that (in the investment setting in which Kelly works) the EMM strategy for a single step doesn’t depend on the amount of money you have—you either put all your money in the best investment, or you keep all of your money because there are no good investments. Therefore, the payout of the EMM in a single step is some multiple C of the amount of money it begins that step with. Therefore, an EMM looking one step ahead just values its winnings at the end of the first step C more—but this doesn’t change its behavior, since multiplying everything by C doesn’t change what the max-expectation strategy will be. Similarly, two-step lookahead only modifies things by C2, and so on. So an EMM looking far ahead behaves just like one maximizing its holdings in the very next step.
The trick in the analysis is the way we replace a big sum over lots of possible ways things could go with a single “typical” outcome. This might initially seem like a mere computational convenience—after all, the vast vast majority of possible sequences have approximately the expected win/loss frequencies. Here, though, it makes all the difference, because it eliminates from consideration the worlds which have the highest weight in the EMM analysis—the worlds where things to really well and the EMM gets exponentially much money.
OK, so, is the derivation just a mistake?
I think many english-language justifications of the Kelly criterion or log-wealth maximization are misleading or outright wrong. I don’t think we can justify it as an analysis of the best long-term strategy, because the analysis rules out any sequence other than those with the most probable statistics, which isn’t a move motivated by long-term analysis. I don’t think we can even justify it as “time average rather than ensemble average” because we’re not time-averaging wealth. Indeed, the whole point is supposedly to deal with the non-ergodic cases; but non-ergodic systems don’t have unique time-averaged behavior!
However, I ultimately find something convincing about the analysis: namely, from an evolutionary perspective, we expect to eventually find that only (approximate) log-wealth maximizers remain in the market (with non-negligible funds).
This conclusion is perfectly compatible with expected utility theory as embodied by the VNM axioms et cetera. It’s an argument that market entities will tend to have utility=log(money), at least approximately, at least in common situations which we can expect strategies to be optimized for. More generally, there might be an argument that evolved organisms will tend to have utility=log(resources), for many notions of resources.
However, maybe Nassim Nicolas Taleb would rebuke us for this tepid and timid conclusion. In terms of pure utility theory, applying a log before taking an expectation is a distinction without a difference—we were allowed any utility function we wanted from the start, so requiring an arbitrary transform means nothing. For example, we can “solve” the St. Petersburg paradox by claiming our utility is the log of money—but we can then re-create the paradox by putting all the numbers in the game through an exponential function! So what’s the point? We should learn from our past mistakes, and choose a framework which won’t be prone to those same errors.
So, can we steelman the claims that expected utility theory is wrong? Can we find a decision procedure which is consistent with the Peters’ general idea, but isn’t just log-wealth maximization?
Well, let’s look again at the kelly-criterion analysis. Can we make that into a general-purpose decision procedure? Can we get it to produce results incompatible with VNM? If so, is the procedure at all plausible?
As I’ve already mentioned, there isn’t a clear way to apply the law-of-large-numbers trick in non-ergodic situations, because there is not a unique “typical” set of frequencies which emerges. Can we do anything to repair the situation, though?
I propose that we maximize median expected value. This gives a notion of “typical” which does not rely on an application of the law of large numbers, so it’s fine if the statistics of our sequence don’t converge to a single unique point. If they do, however, the median will evaluate things from that point. So, it’s a workable generalization of the principle behind Kelly betting.
The median also relates to something mentioned in the OP:
The median is the 50th percentile, so there you go.
Maximizing the median indeed violates VNM:
It’s discontinuous. Small differences in probability can change the median outcome by a lot. Maybe this isn’t so bad—who really cares about continuity, anyway? Yeah, seemingly small differences in probability create “unjustified” large differences in perceived quality of a plan, but only in circumstances where outcomes are sparse enough that the median is not very “informed”.
It violates independence, in a more obviously concerning way. A median-maximizer doesn’t care about “outlier” outcomes. It’s indifferent between the following two plans, which seems utterly wrong:
A plan with 100% probability of getting you $100
A plan with 60% probability of getting you $100, and 40% probability of getting you killed.
Both of these concerns become negligible as we take a long-term view. The longer into the future we look, the more outcomes there will be, making the median more robust to shifting probabilities. Similarly, a median-maximizer is indifferent between the two options above, but if you consider the iterated game, it will strongly prefer the global strategy of always selecting the first option.
Still, I would certainly not prefer to optimize median value myself, or create AGI which optimizes median value. What if there’s a one-shot situation which is similar to the 40%-death example? I think I similarly don’t want to maximize the 99th percentile outcome, although this is less clearly terrible.
Can we give an evolutionary argument for median utility, as a generalization of the evolutionary argument for log utility? I don’t think so. The evolutionary argument relies on the law of large numbers, to say that we’ll almost surely end up in a world where log-maximizers prosper. There’s no similar argument that we almost surely end up in the “median world”.
So, all told:
I don’t think there’s a good argument against expectation-maximization here.
But I do think those who think there is should consider median-maximization, as it’s an alternative to expectation-maximization which is consistent with much of the discussion here.
I basically buy the argument that utility should be log of money.
I don’t think it’s right to describe the whole thing as “time-average vs ensemble-average”, and suspect some of the “derivations” are question-begging.
I do think there’s an evolutionary argument which can be understood from some of the derivations, however.
I now like the “time vs ensemble” description better. I was trying to understand everything coming from a Bayesian frame, but actually, all of these ideas are more frequentist.
In a Bayesian frame, it’s natural to think directly in terms of a decision rule. I didn’t think time-averaging was a good description because I didn’t see a way for an agent to directly replace ensemble average with time average, in order to make decisions:
Ensemble averaging is the natural response to decision-making under uncertainty; you’re averaging over different possibilities. When you try to time-average to get rid of your uncertainty, you have to ask “time average what?”—you don’t know what specific situation you’re in.
In general, the question of how to turn your current situation into a repeated sequence for the purpose of time-averaging analysis seems under-determined (even if you are certain about your present situation). Surely Peters doesn’t want us to use actual time in the analysis; in actual time, you end up dead and lose all your money, so the time-average analysis is trivial.
Even if you settle on a way to turn the situation into an iterated sequence, the necessary limit does not necessarily exist. This is also true of the possibility-average, of course (the St Petersburg Paradox being a classic example); but it seems easier to get failure in the time-avarage case, because you just need non-convergence; ie, you don’t need any unbounded stuff to happen.
However, all of these points are also true of frequentism:
Frequentist approaches start from the objective/external perspective rather than the agent’s internal uncertainty. They don’t want to define probability as the subjective viewpoint; they want probability to be defined as limiting frequencies if you repeated an experiment over and over again. The fact that you don’t have direct access to these is a natural consequence of you not having direct access to objective truth.
Even given direct access to objective truth, frequentist probabilities are still under-defined because of the reference class problem—what infinite sequence of experiments do you conceive of your experiment as part of?
And, again, once you select a sequence, there’s no guarantee that a limit exists. Frequentism has to solve this by postulating that limits exist for the kinds of reference classes we want to talk about.
So, I now think what Ole Peters is working on is frequentist decision theory. Previously, the frequentist/Bayesian debate was about statistics and science, but decision theory was predominantly Bayesian. Ole Peters is working out the natural theory of decision making which frequentists could/should have been pursuing. (So, in that sense, it’s much more than just a new argument for kelly betting.)
Describing frequentist-vs-Bayesian as time-averaging vs possibility-averaging (aka ensemble-averaging) seems perfectly appropriate.
So, on my understanding, Ole’s response to the three difficulties could be:
We first understand the optimal response to an objectively defined scenario; then, once we’ve done that, we can concern ourselves with the question of how to actually behave given our uncertainty about what situation we’re in. This is not trying to be a universal formula for rational decision making in the same way Bayesianism attempts to be; you might have to do some hard work to figure out enough about your situation in order to apply the theory.
And when we design general-purpose techniques, much like when we design statistical tests, our question should be whether given an objective scenario the decision-making technique does well—the same as frequentists wanting estimates to be unbiased. Bayesians want decisions and estimates to be optimal given our uncertainty instead.
As for how to turn your situation into an iterated game, Ole can borrow the frequentist response of not saying much about it.
As for the existence of a limit, Ole actually says quite a bit about how to fiddle with the math until you’re dealing with a quantity for which a limit exists. See his lecture notes. On page 24 (just before section 1.3) he talks briefly about finding an appropriate function of your wealth such that you can do the analysis. Then, section 2.7 says much more about this.
The general idea is that you have to choose an analysis which is appropriate to the dynamics. Additive dynamics call for additive analysis (examining the time-average of wealth). Multiplicative dynamics call for multiplicative analysis (examining the time-average of growth, as in kelly betting and similar settings). Other settings call for other functions. Multiplicative dynamics are common in financial theory because so much financial theory is about investment, but if we examine financial decisions for those living on income, then it has to be very different.
Yes. As I’ve pointed out before, a lot of these problems go away if you simply solve the actual problem instead of a pseudo-problem. Decision theory, and Bayesian decision theory, has no problem with multi-step processes, like POMDPs/MDPs—or at least, I have yet to see anyone explain what, if anything, of Peters/Taleb’s ‘criticisms’ of expected-value goes away if you actually solve the corresponding MDP. (Bellman did it better 70 years ago.)
I like the “Bellman did it better” retort ;p
FWIW, I remain pretty firmly in the expected-utility camp; but I’m quite interested in looking for cracks around the edges, and exploring possibilities.
I agree that there’s no inherent decision-theory issue with multi-step problems (except for the intricacies of tiling issues!).
However, the behavior of Bayesian agents with utility linear in money, on the Kelly-betting-style iterated investment game, for high number of iterations, seems viscerally wrong. I can respect treating it as a decision-theoretic counterexample, and looking for decision theories which don’t “make that mistake”. I’m interested in seeing what the proposals look like.