First and foremost: you and I have disagreed in the past on wanting descriptive vs prescriptive roles for probability/decision theory. In this case, I’d paraphrase the two perspectives as:
Prescriptive-pure-Bayes: as long as we’re maximizing an expected utility, we’re “good”, and it doesn’t really matter which utility. But many utilities will throw away all their money with probability close to 1, so Kelly isn’t prescriptively correct.
Descriptive-pure-Bayes: as long as we’re not throwing away money for nothing, we’re implicitly maximizing an expected utility. Maximizing typical (i.e. modal/median/etc) long-run wealth is presumably incompatible with throwing away money for nothing, so presumably a typical-long-run-wealth-maximizer is also an expected utility maximizer. (Note that this is nontrivial, since “typical long-run wealth” is not itself an expectation.) Sure enough, the Kelly rule has the form of expected utility maximization, and the implicit utility is logarithmic.
In particular, this is relevant to:
Remember all that stuff about how a Bayesian money-maximizer would behave? That was crazy. The Bayesian money-maximizer would, in fact, lose all its money rather quickly (with very high probability). Its in-expectation returns come from increasingly improbable universes. Would natural selection design agents like that, if it could help it?
“Does Bayesian utility maximization imply good performance?” is mainly relevant to the prescriptive view. “Does good performance imply Bayesian utility maximization?” is the key descriptive question. In this case, the latter would say that natural selection would indeed design Bayesian agents, but that does not mean that every Bayesian agent is positively selected—just that those designs which are positively selected are (approximately) Bayesian agents.
“Natural” → Symmetry
Peters makes much of this idea of what’s “natural”. He talks about additive problems vs multiplicative problems, as well as the more general case (when neither additive/multiplicative work).
However, as far as I can tell, this boils down to creatively choosing a function which makes the math work out.
I haven’t read Peters, but the argument I see in this space is about symmetry/exchangeability (similar to some of de Finetti’s stuff). Choosing a function which makes reward/utility additive across timesteps is not arbitrary; it’s making utility have the same symmetry as our beliefs (in situations where each timestep’s variables are independent, or at least exchangeable).
In general, there’s a whole cluster of theorems which say, roughly, if a function f(x1,...xn) is invariant under re-ordering its inputs, then it can be written as f(x1,...xn)=g(∑ih(xi)) for some g, h. This includes, for instance, characterizing all finite abelian groups as modular addition, or de Finetti’s Theorem, or expressing symmetric polynomials in terms of power-sum polynomials. Addition is, in some sense, a “standard form” for symmetric functions.
Suppose we have a sequence of n bets. Our knowledge is symmetric under swapping the bets around, and our terminal goals don’t involve the bets themselves. So, our preferences should be symmetric under swapping the bets around. That implies we can write it in the “standard form”—i.e. we can express our preferences as a function of a sum of some summary data about each bet.
I’m not seeing the full argument yet, but it feels like there’s something in roughly that space. Presumably it would derive a de Finetti-style exchangeability-based version of Bayesian reasoning.
I agree with your prescriptive vs descriptive thing, and agree that I was basically making that mistake.
I think the correct position here is something like: expected utility maximization; and also, utility in “these cases” is going to be close to logarithmic. (IE, if you evolve trading strategies in something resembling Kelly’s conditions, you’ll end up with something resembling Kelly agents. And there’s probably some generalization of this which plausibly abstracts aspects of the human condition.)
Ole Peters is trying to re-found decision theory on the basis of this second layer alone. I think this is basically a good instinct:
It’s good to try to firm up this second layer, since just Kelly alone is way too special-case, and we’d like to understand the phenomenon in as much generality as possible.
It’s good to try and make a 1-layer system rather than a 2-layer one, to try and make our principles as unified as possible. The Kelly idea is consistent with our foundation of expectation maximization, sure, but if “realistic” agents systematically avoid some utility functions, that makes expectation maximization a worse descriptive theory. Perhaps there is a better one.
This is similar to the way Solomonoff is a two-layer system: there’s a lower layer of probability theory, and then on top of that, there’s the layer of algorithmic information theory, which tells us to prefer particular priors. In hindsight this should have been “suspicious”; logical induction merges those two layers together, giving a unified framework which gives us (approximately) probability theory and also (approximately) algorithmic information theory, tying them together with a unified bounded-loss notion. (And also implies many new principles which neither probability theory nor algorithmic information theory gave us.)
So although I agree that your descriptive lens is the better one, I think that lens has similar implications.
As for your comments about symmetry—I must admit that I tend to find symmetry arguments to be weak. Maybe you can come up with something cool, but I would tend to predict it’ll be superseded by less symmetry-based alternatives. For one thing, it tends to be a two-layered thing, with symmetry constraints added on top of more basic ideas.
I have some interesting disagreements with this.
Prescriptive vs Descriptive
First and foremost: you and I have disagreed in the past on wanting descriptive vs prescriptive roles for probability/decision theory. In this case, I’d paraphrase the two perspectives as:
Prescriptive-pure-Bayes: as long as we’re maximizing an expected utility, we’re “good”, and it doesn’t really matter which utility. But many utilities will throw away all their money with probability close to 1, so Kelly isn’t prescriptively correct.
Descriptive-pure-Bayes: as long as we’re not throwing away money for nothing, we’re implicitly maximizing an expected utility. Maximizing typical (i.e. modal/median/etc) long-run wealth is presumably incompatible with throwing away money for nothing, so presumably a typical-long-run-wealth-maximizer is also an expected utility maximizer. (Note that this is nontrivial, since “typical long-run wealth” is not itself an expectation.) Sure enough, the Kelly rule has the form of expected utility maximization, and the implicit utility is logarithmic.
In particular, this is relevant to:
“Does Bayesian utility maximization imply good performance?” is mainly relevant to the prescriptive view. “Does good performance imply Bayesian utility maximization?” is the key descriptive question. In this case, the latter would say that natural selection would indeed design Bayesian agents, but that does not mean that every Bayesian agent is positively selected—just that those designs which are positively selected are (approximately) Bayesian agents.
“Natural” → Symmetry
I haven’t read Peters, but the argument I see in this space is about symmetry/exchangeability (similar to some of de Finetti’s stuff). Choosing a function which makes reward/utility additive across timesteps is not arbitrary; it’s making utility have the same symmetry as our beliefs (in situations where each timestep’s variables are independent, or at least exchangeable).
In general, there’s a whole cluster of theorems which say, roughly, if a function f(x1,...xn) is invariant under re-ordering its inputs, then it can be written as f(x1,...xn)=g(∑ih(xi)) for some g, h. This includes, for instance, characterizing all finite abelian groups as modular addition, or de Finetti’s Theorem, or expressing symmetric polynomials in terms of power-sum polynomials. Addition is, in some sense, a “standard form” for symmetric functions.
Suppose we have a sequence of n bets. Our knowledge is symmetric under swapping the bets around, and our terminal goals don’t involve the bets themselves. So, our preferences should be symmetric under swapping the bets around. That implies we can write it in the “standard form”—i.e. we can express our preferences as a function of a sum of some summary data about each bet.
I’m not seeing the full argument yet, but it feels like there’s something in roughly that space. Presumably it would derive a de Finetti-style exchangeability-based version of Bayesian reasoning.
I agree with your prescriptive vs descriptive thing, and agree that I was basically making that mistake.
I think the correct position here is something like: expected utility maximization; and also, utility in “these cases” is going to be close to logarithmic. (IE, if you evolve trading strategies in something resembling Kelly’s conditions, you’ll end up with something resembling Kelly agents. And there’s probably some generalization of this which plausibly abstracts aspects of the human condition.)
But note how piecemeal and fragile this sounds. One layer is the relatively firm expectation-maximization layer. On top of this we add another layer (based on maximization of mode/median/quantile, so that we can ignore things not true with probability 1) which argues for some utility functions in particular.
Ole Peters is trying to re-found decision theory on the basis of this second layer alone. I think this is basically a good instinct:
It’s good to try to firm up this second layer, since just Kelly alone is way too special-case, and we’d like to understand the phenomenon in as much generality as possible.
It’s good to try and make a 1-layer system rather than a 2-layer one, to try and make our principles as unified as possible. The Kelly idea is consistent with our foundation of expectation maximization, sure, but if “realistic” agents systematically avoid some utility functions, that makes expectation maximization a worse descriptive theory. Perhaps there is a better one.
This is similar to the way Solomonoff is a two-layer system: there’s a lower layer of probability theory, and then on top of that, there’s the layer of algorithmic information theory, which tells us to prefer particular priors. In hindsight this should have been “suspicious”; logical induction merges those two layers together, giving a unified framework which gives us (approximately) probability theory and also (approximately) algorithmic information theory, tying them together with a unified bounded-loss notion. (And also implies many new principles which neither probability theory nor algorithmic information theory gave us.)
So although I agree that your descriptive lens is the better one, I think that lens has similar implications.
As for your comments about symmetry—I must admit that I tend to find symmetry arguments to be weak. Maybe you can come up with something cool, but I would tend to predict it’ll be superseded by less symmetry-based alternatives. For one thing, it tends to be a two-layered thing, with symmetry constraints added on top of more basic ideas.