Re 1: looking back at the subthread, yes, I think that was the source of much confusion. I did mean maximization in the long run and for quite a while did not realize that you and DanielLC were talking about the maximization in a single iteration.
Re 4: The expectation operator is just a weighted sum (in the discrete case) or an integral (in the continuous case). I don’t think it cares about the fatness of tails or whether some moments are defined or not.
Speaking generally, log(E(X)) is not the same thing as E(log(X)) (see Jensen’s Inequality), but that’s a different question. The question we have is that if you have some set of parameters theta that X is conditional on, does maximizing for E(X) lead you to different optimal thetas than maximizing for E(log(X))?
Re 6: Well, you have to be careful that Kelly Rule assumptions hold. It works as it works because capital growth is multiplicative, not additive, and because you expect to have many iterations of betting, for example.
The expectation operator doesn’t care about fatness of tails (well, it kinda doesn’t, but note that e.g. the expectation of a random variable with Cauchy distribution is undefined, precisely because of those very fat tails), but the theorem that says that in the long run your wealth is almost always close to its expectation may fail for fat-tail-related reasons.
does maximizing for E(X) lead you to different optimal thetas than maximizing for E(log(X))?
In the present case where we’re looking at long-run results only, the answer might be “no” (but—see above—I’m not sure it actually is). But in general, if you allow X to be any random variable rather than some kind of long-run average of well-behaved things, it is absolutely not in any way true that maximizing E(X) leads to the same parameter choices as maximizing E(log(X)).
Well, you have to be careful that Kelly Rule assumptions hold.
If you want your choice to be optimal, sure. But all I’m saying is that using “the Kelly rule” to mean “making the choice that maximizes expected log bankroll” seems like a reasonable bit of terminology. Whether using the Kelly rule, in this sense, is a good idea in any given case will of course depend on all sorts of details.
Good point about Cauchy. If even the mean is undefined, all bets are off :-)
it is absolutely not in any way true that maximizing E(X) leads to the same parameter choices as maximizing E(log(X))
Can I get an example? Say, X is a random positive real number. For which distribution which parameters that maximize E(X) will not maximize E(log(X))?
using “the Kelly rule” to mean “making the choice that maximizes expected log bankroll” seems like a reasonable bit of terminology.
I don’t know about that. The Kelly Rule means a specific strategy in a specific setting and diluting and fuzzifying that specificity doesn’t seem useful.
Can I get an example? Say, X is a random positive real number. For which distribution which parameters that maximize E(X) will not maximize E(log(X))?
That is exactly what the Kelly criterion provides examples of. Let p be the probability of winning some binary bet and k the multiple of your bet that is returned to you if you win. Given an initial bankroll of 1, let theta be the proportion of it you are going to bet. Let the distribution of your bankroll after the bet be X. With probability p, X is 1+theta(k-1), and with probability 1-p, X is 1-theta. theta is a parameter of this distribution. (So are p and k, but we are interested in maximising over theta for given p and k.)
If pk > 1 then theta = 1 maximises E(X), but theta = (pk-1)/(k-1) maximises E(log(X)).
The graphs of E(X) and E(log(X)) as functions of theta look nothing like each other. The first is a linear ascending gradient, and the second rises to a maximum and then plunges to -∞.
May have gotten confused because log is monotonically increasing e.g. log likelihood maximized at the same spot as likelihood. So log E(X) is maximized at the same spot as E(X). But log and E do not commute (Jensen’s inequality is not called Jensen’s equality, after all).
Sure. So, just to be clear, the situation is: We have real-valued random variable X depending on a single real-valued parameter t. And I claim it is possible (indeed, usual) that the choice of t that maximizes E(log X) is not the same as the choice of t that maximizes E(X).
My X will have two possible values for any given t, both with probability 1⁄2. They are t exp t and exp −2t.
E(log X) = 1⁄2 (log(t exp t) + log(exp −2t)) = 1⁄2 (log t + t − 2t) = 1⁄2 (log t—t). This is maximized at t=1. (It’s also undefined for t<=0; I’ll fix that in a moment.)
E(X) is obviously monotone increasing for large positive t, so it’s “maximized at t=+oo”. (It doesn’t have an actual maximum; I’ll fix that in a moment.)
OK, now let me fix those two parenthetical quibbles. I said X depends on t, but actually it turns out that t = 100.5 + 100 sin u, where u is an angle (i.e., varies mod 2pi). Now E(X) is maximized when sin u = 1, so for u = pi2; and E(log X) is maximized when 100 sin u = −99.5, so for two values of u close to -pi/2. (Two local maxima, with equal values of E(log X).)
Okay, I accept that I’m wrong and you’re right. Now the interesting part is that my mathematical intuition is not that great, but this is a pretty big fail even for it. So in between googling for crow recipes, I think I need to poke around my own mind and figure out which wrong turn did it happily take… I suspect I got confused about the expectation operator, but to confirm I’ll need to drag my math intuition into the interrogation room and start asking it pointed questions.
Can I get an example? Say, X is a random positive real number. For which distribution which parameters that maximize E(X) will not maximize E(log(X))?
As a trivial example, let’s say you are choosing between distribution A and distribution B.
In distribution A, X=100 with probability 0.5, and X=epsilon with probability 0.5
In distribution B, X=10 with probability 1
The average value of X under distribution A is 50, whereas the average value of X under distribution B is 10. If you want to maximize E(X) you will therefore choose distribution A
The average value of log X under distribution A is negative infinity, whereas the average value of log X under distribution B is 1. If you want to maximize E(log X) you will choose distribution B.
Edited to add: The idea behind Von Neumann Morgenstern Expected Utility Theory is that optimizing your expected utility does not imply the same choices as optimizing the expected payoff. If you maximize for E(X) your utility function is risk neutral, if you maximize for E(log X) your utility function is risk averse etc. If maximizing these two expectations always implied identical choices, it would not be possible to define risk aversion.
Can I get an example? Say, X is a random positive real number. For which distribution which parameters that maximize E(X) will not maximize E(log(X))?
As a trivial example, let’s say you are choosing between distribution A and distribution B.
In distribution A, X=100 with probability 0.5, and X=epsilon with probability 0.5
In distribution B, X=10 with probability 1
The average value of X under distribution A is 50, whereas the average value of X under distribution B is 10. If you want to maximize E(X) you will therefore choose distribution A
The average value of log X under distribution A is negative infinity, whereas the average value of log X under distribution B is 1. If you want to maximize E(log X) you will choose distribution B.
Re 1: looking back at the subthread, yes, I think that was the source of much confusion. I did mean maximization in the long run and for quite a while did not realize that you and DanielLC were talking about the maximization in a single iteration.
Re 4: The expectation operator is just a weighted sum (in the discrete case) or an integral (in the continuous case). I don’t think it cares about the fatness of tails or whether some moments are defined or not.
Speaking generally, log(E(X)) is not the same thing as E(log(X)) (see Jensen’s Inequality), but that’s a different question. The question we have is that if you have some set of parameters theta that X is conditional on, does maximizing for E(X) lead you to different optimal thetas than maximizing for E(log(X))?
Re 6: Well, you have to be careful that Kelly Rule assumptions hold. It works as it works because capital growth is multiplicative, not additive, and because you expect to have many iterations of betting, for example.
The expectation operator doesn’t care about fatness of tails (well, it kinda doesn’t, but note that e.g. the expectation of a random variable with Cauchy distribution is undefined, precisely because of those very fat tails), but the theorem that says that in the long run your wealth is almost always close to its expectation may fail for fat-tail-related reasons.
In the present case where we’re looking at long-run results only, the answer might be “no” (but—see above—I’m not sure it actually is). But in general, if you allow X to be any random variable rather than some kind of long-run average of well-behaved things, it is absolutely not in any way true that maximizing E(X) leads to the same parameter choices as maximizing E(log(X)).
If you want your choice to be optimal, sure. But all I’m saying is that using “the Kelly rule” to mean “making the choice that maximizes expected log bankroll” seems like a reasonable bit of terminology. Whether using the Kelly rule, in this sense, is a good idea in any given case will of course depend on all sorts of details.
Good point about Cauchy. If even the mean is undefined, all bets are off :-)
Can I get an example? Say, X is a random positive real number. For which distribution which parameters that maximize E(X) will not maximize E(log(X))?
I don’t know about that. The Kelly Rule means a specific strategy in a specific setting and diluting and fuzzifying that specificity doesn’t seem useful.
That is exactly what the Kelly criterion provides examples of. Let p be the probability of winning some binary bet and k the multiple of your bet that is returned to you if you win. Given an initial bankroll of 1, let theta be the proportion of it you are going to bet. Let the distribution of your bankroll after the bet be X. With probability p, X is 1+theta(k-1), and with probability 1-p, X is 1-theta. theta is a parameter of this distribution. (So are p and k, but we are interested in maximising over theta for given p and k.)
If pk > 1 then theta = 1 maximises E(X), but theta = (pk-1)/(k-1) maximises E(log(X)).
The graphs of E(X) and E(log(X)) as functions of theta look nothing like each other. The first is a linear ascending gradient, and the second rises to a maximum and then plunges to -∞.
Yep, I was wrong. Now I need to figure out why I thought I was right..
May have gotten confused because log is monotonically increasing e.g. log likelihood maximized at the same spot as likelihood. So log E(X) is maximized at the same spot as E(X). But log and E do not commute (Jensen’s inequality is not called Jensen’s equality, after all).
Was probably part of it—I think the internal cheering for the wrong position included the words “But log likelihood!” :-/
Sure. So, just to be clear, the situation is: We have real-valued random variable X depending on a single real-valued parameter t. And I claim it is possible (indeed, usual) that the choice of t that maximizes E(log X) is not the same as the choice of t that maximizes E(X).
My X will have two possible values for any given t, both with probability 1⁄2. They are t exp t and exp −2t.
E(log X) = 1⁄2 (log(t exp t) + log(exp −2t)) = 1⁄2 (log t + t − 2t) = 1⁄2 (log t—t). This is maximized at t=1. (It’s also undefined for t<=0; I’ll fix that in a moment.)
E(X) is obviously monotone increasing for large positive t, so it’s “maximized at t=+oo”. (It doesn’t have an actual maximum; I’ll fix that in a moment.)
OK, now let me fix those two parenthetical quibbles. I said X depends on t, but actually it turns out that t = 100.5 + 100 sin u, where u is an angle (i.e., varies mod 2pi). Now E(X) is maximized when sin u = 1, so for u = pi2; and E(log X) is maximized when 100 sin u = −99.5, so for two values of u close to -pi/2. (Two local maxima, with equal values of E(log X).)
Okay, I accept that I’m wrong and you’re right. Now the interesting part is that my mathematical intuition is not that great, but this is a pretty big fail even for it. So in between googling for crow recipes, I think I need to poke around my own mind and figure out which wrong turn did it happily take… I suspect I got confused about the expectation operator, but to confirm I’ll need to drag my math intuition into the interrogation room and start asking it pointed questions.
Upvoted for public admission of error :-).
(In the unlikely event that I can help with the brain-fixing, e.g. by supplying more counterexamples to things, let me know.)
As a trivial example, let’s say you are choosing between distribution A and distribution B.
In distribution A, X=100 with probability 0.5, and X=epsilon with probability 0.5
In distribution B, X=10 with probability 1
The average value of X under distribution A is 50, whereas the average value of X under distribution B is 10. If you want to maximize E(X) you will therefore choose distribution A
The average value of log X under distribution A is negative infinity, whereas the average value of log X under distribution B is 1. If you want to maximize E(log X) you will choose distribution B.
Edited to add: The idea behind Von Neumann Morgenstern Expected Utility Theory is that optimizing your expected utility does not imply the same choices as optimizing the expected payoff. If you maximize for E(X) your utility function is risk neutral, if you maximize for E(log X) your utility function is risk averse etc. If maximizing these two expectations always implied identical choices, it would not be possible to define risk aversion.
As a trivial example, let’s say you are choosing between distribution A and distribution B.
In distribution A, X=100 with probability 0.5, and X=epsilon with probability 0.5
In distribution B, X=10 with probability 1
The average value of X under distribution A is 50, whereas the average value of X under distribution B is 10. If you want to maximize E(X) you will therefore choose distribution A
The average value of log X under distribution A is negative infinity, whereas the average value of log X under distribution B is 1. If you want to maximize E(log X) you will choose distribution B.