I understand your point, and I think I am sort of convinced. But its the sort of thing where minor details in the model can change things quite a lot. For example, I am sort of assuming that Bob gets no utility at all from his money until he walks out of the casino with his winnings—IE having the money and still being in the casino is worth nothing to him, because he can’t buy stuff with it. Where as you seem to be comparing Bob with his counter-factual at each round number—while I am only interested in Bob at the very end of the process, when he walks away with his winnings to get all that utility. But your proposed Bob never walks away from the table with any winnings. (Assuming no round limit). If he still has winnings he doesn’t walk away.
Lets put details on the scenario in two slightly different ways. (1) the “casino” is just a computer script where Bob can program in a strategy (bet it all every time), and then just type in the number of rounds (N). (Or, for your version of Bob, put the whole thing in a “while my_money > 0:” loop.) We could alternatively (2) imagine that Bob is in the casino playing each round one at a time, and that the time taken doing 1 round is a fixed utility cost of some small number (say 0.1). This doesn’t change anything for utility-maximising-Bob, and in fact the time costs for 1 more round relative to his expected gains shrink over time as his money doubles up. (later rounds are a better deal in expectation).
With these models I just see a system where Bob deterministically looses all his money. The longer he goes before going bust, the more of his time he wastes as well (in (2)).
Kelly betting doesn’t actually fix my complaint. A Kelly betting Bob with no point at which they say “Yes, that is enough money, time to leave.” actually gets minus infinity utility in model (2) where doing a round costs a small but finite amount of utility in terms of the time spent. Because the money acquired doesn’t pay off till they leave, which they never do.
I think maybe you are right that it comes down to the utility function. Any agent (even the Kelly one) will behave in a way that comes across as obviously insane if we allow their utility function to go to infinity. Although I still don’t quite see how that infinity actually ever enters in this specific case. If we answer the infinite utility function with an infinite number of possible rounds then we can say with certainty that Bob never walks away with any winnings.
I agree infinity is what makes things go weird here, but as you say, not particularly weirder for Bob than for Kelly-Betting Bob (who also never leaves the casino, and also wraps in a while my_money > 0 loop).
But what you say here seems to undermine your original comment:
The problem with maximising expected utility is that Bob will sit their playing 1 more round, then another 1 more round again and again until he eventually looses everything.
But KBB also sits there playing one more round, then another round. He doesn’t eventually lose everything, but he doesn’t leave either. This isn’t a problem with maximizing expected utility, it’s a problem with infinity.
At least to me Kelly betting fits in the same kind of space as the Newcomb paradox and (possibly) the prisoners dilemma. They all demonstrate that the optimal policy is not necessarily given by a sequence of optimal actions at every step.
But with this setup, it only demonstrates that if we wave our hands and talk about what happens after playing infinitely many rounds of a game we never want to stop playing.
If we aren’t talking about something like that, then optimal policy for the expected-money maximizer is given by taking the optimal action at every step.
Yes, my position did indeed shift, as you changed my mind and I thought about it in more depth. My original position was very much pro-Kelly. On thinking about your points I now think it is the while my_money > 0 aspect where the problem really lies. I still stand by the difference between optimal global policy and optimal action at each step distinction, because at each step the optimal policy (for Kelly or not) is to shake the dice another time. But, if this is taken as a policy we arrive at the while my_money > 0 break condition being the only escape, which is clearly a bad policy. (It guarantees that in any world we walk away, we walk away with nothing.)
Nod. I think we basically agree at this point. Certainly I don’t intend to claim that optimal policy and optimal actions always coincide (I have more thoughts on that but don’t want to get into them).
I understand your point, and I think I am sort of convinced. But its the sort of thing where minor details in the model can change things quite a lot. For example, I am sort of assuming that Bob gets no utility at all from his money until he walks out of the casino with his winnings—IE having the money and still being in the casino is worth nothing to him, because he can’t buy stuff with it. Where as you seem to be comparing Bob with his counter-factual at each round number—while I am only interested in Bob at the very end of the process, when he walks away with his winnings to get all that utility. But your proposed Bob never walks away from the table with any winnings. (Assuming no round limit). If he still has winnings he doesn’t walk away.
Lets put details on the scenario in two slightly different ways. (1) the “casino” is just a computer script where Bob can program in a strategy (bet it all every time), and then just type in the number of rounds (N). (Or, for your version of Bob, put the whole thing in a “while my_money > 0:” loop.) We could alternatively (2) imagine that Bob is in the casino playing each round one at a time, and that the time taken doing 1 round is a fixed utility cost of some small number (say 0.1). This doesn’t change anything for utility-maximising-Bob, and in fact the time costs for 1 more round relative to his expected gains shrink over time as his money doubles up. (later rounds are a better deal in expectation).
With these models I just see a system where Bob deterministically looses all his money. The longer he goes before going bust, the more of his time he wastes as well (in (2)).
Kelly betting doesn’t actually fix my complaint. A Kelly betting Bob with no point at which they say “Yes, that is enough money, time to leave.” actually gets minus infinity utility in model (2) where doing a round costs a small but finite amount of utility in terms of the time spent. Because the money acquired doesn’t pay off till they leave, which they never do.
I think maybe you are right that it comes down to the utility function. Any agent (even the Kelly one) will behave in a way that comes across as obviously insane if we allow their utility function to go to infinity. Although I still don’t quite see how that infinity actually ever enters in this specific case. If we answer the infinite utility function with an infinite number of possible rounds then we can say with certainty that Bob never walks away with any winnings.
I agree infinity is what makes things go weird here, but as you say, not particularly weirder for Bob than for Kelly-Betting Bob (who also never leaves the casino, and also wraps in a
while my_money > 0
loop).But what you say here seems to undermine your original comment:
But KBB also sits there playing one more round, then another round. He doesn’t eventually lose everything, but he doesn’t leave either. This isn’t a problem with maximizing expected utility, it’s a problem with infinity.
But with this setup, it only demonstrates that if we wave our hands and talk about what happens after playing infinitely many rounds of a game we never want to stop playing.
If we aren’t talking about something like that, then optimal policy for the expected-money maximizer is given by taking the optimal action at every step.
Yes, my position did indeed shift, as you changed my mind and I thought about it in more depth. My original position was very much pro-Kelly. On thinking about your points I now think it is the
while my_money > 0
aspect where the problem really lies. I still stand by the difference between optimal global policy and optimal action at each step distinction, because at each step the optimal policy (for Kelly or not) is to shake the dice another time. But, if this is taken as a policy we arrive at thewhile my_money > 0
break condition being the only escape, which is clearly a bad policy. (It guarantees that in any world we walk away, we walk away with nothing.)Nod. I think we basically agree at this point. Certainly I don’t intend to claim that optimal policy and optimal actions always coincide (I have more thoughts on that but don’t want to get into them).