Ben comments on Kelly betting vs expectation maximization

Ben 30 May 2023 13:13 UTC
4 points
2
The problem with maximising expected utility is that Bob will sit their playing 1 more round, then another 1 more round again and again until he eventually looses everything. Each step maximised the expected utility, but the policy overall guarantees zero utility with certainty, assuming Bob never runs out of time.
But, even as utility-maximising-Bob is saved from self-destruction by the clock, he shall think to himself “dam it! Out of time. That is really annoying, I want to keep doing this bet”.
At least to me Kelly betting fits in the same kind of space as the Newcomb paradox and (possibly) the prisoners dilemma. They all demonstrate that the optimal policy is not necessarily given by a sequence of optimal actions at every step.
- philh 30 May 2023 17:04 UTC
  7 points
  1
  Parent
  Ignoring infinities, do you have the same objection to a game with a limit of 100 rounds? Utility-maximizing Bob will bet all his money 100 times, and lose all of it with probability around $1 - 10^{- 24}$ , and he’ll endorse that because one time in $10^{24}$ he is raking it in to the tune of $10^{32}$ dollars or something. If you try to stop him he’ll be justly annoyed because you’re not letting him maximize his utility function.
  
  Do you think that’s a problem for expected utility maximization? If so, it seems to me that your objection isn’t “optimal policy doesn’t come from optimal actions”. (At any rate I think that would be a bad objection, because optimal policy for this utility function does come from optimal actions at each step.) Rather, it seems to me that your objection is you don’t really believe Bob has that utility function.
  
  Which, of course he doesn’t! No one has a utility function like that (or, indeed, at all). And I think that’s important to realize. But it’s a different objection, and I think that’s important to realize too.
  - Ben 30 May 2023 20:53 UTC
    3 points
    0
    Parent
    Yes, I completely agree that the main reason in real life we would recommend against that strategy is that we instinctively (and usually correctly) feel that the person’s utility function is sub-linear in money. So that the $10^{32}$ dollars with probability $10^{- 24}$ is bad. Obviously if $10^{32}$ dollars is needed to cure some disease that will otherwise kill them immediately that changes things.
    But, their is an objection that I think runs somewhat separately to that, which is the round limit. If we are operating under an optimal, reasonable policy, then (outside commitment tactic negotiations) I think it shouldn’t really be possible for a new outside constraint to improve our performance. Because if the constraint does improve performance then we could have adopted that constraint voluntarily and our policy was therefore not optimal. And the N-round limit is doing a fairly important job at improving Bob’s performance in this hypothetical. Otherwise Bob’s strategy is equivalent to “I bet everything, every time, until I loose it all.” Perhaps this second objection is just the old one in a new disguise (any agent with a finitely-bounded utility function would eventually reach a round number where they decide “actually I have enough now”, and thus restore my sense of what should be), but I am not sure that it is exactly the same.
    - philh 30 May 2023 23:49 UTC
      6 points
      0
      Parent
      Oh, I don’t think the round limit is fundamental here, I just don’t like infinities :p
      
      At time zero, you can show Bob a bunch of probability distributions for his money at some finite time $t$ , corresponding to betting strategies, and ask which he’d prefer. And his answer will always be that his favorite distribution is the one corresponding to “bet everything every time”. And when it gets to time $t$ , Bob is almost certainly broke, but not actually regretting his decisions in the sense of “knowing what I knew then I could have done better”.
      
      If we take the limit as $t \to \infty$ … I’m not really sure this is a meaningful thing to do. I guess we could take the pointwise limit and see that the resulting function is 1 at 0 and 0 everywhere else, which is indeed a probability distribution we don’t like. But if we take the pointwise limit of the Kelly strategy, it’s 0 everywhere, which isn’t even a probability distribution. I don’t think we should use that as a reason to prefer the Kelly strategy. Maybe there are other limits we can take? (I’ve forgotten a lot of what I used to know.) But mostly I think this is a weird thing to try to do.
      
      If we’re not taking the limit, if we just say Bob can play as long as he wants, then yes, he just keeps playing until he goes broke. But he endorses that behavior. There’s no point where he looks back and goes “I was an idiot”.
      
      One thing I’d say here is that we don’t sum up or compare utilities at different times. Like, it would be tempting to say “with probability 1, Bob will go broke. And however much money he had at the time, with probability 1, his alter ego Kelly-Betting Bob will eventually have more money than that. So Bob would prefer to be Kelly-Betting Bob”. But that last sentence doesn’t hold; Bob knows that in the event he’d managed to stick it out that long, his wealth would so vastly dwarf Kelly-Betting Bob’s that it was worth the risks he took.
      What links here?
      philh's comment on Kelly betting vs expectation maximization by MorgneticField (1 Jun 2023 7:58 UTC; 4 points)
      - Ben 31 May 2023 9:26 UTC
        2 points
        0
        Parent
        I understand your point, and I think I am sort of convinced. But its the sort of thing where minor details in the model can change things quite a lot. For example, I am sort of assuming that Bob gets no utility at all from his money until he walks out of the casino with his winnings—IE having the money and still being in the casino is worth nothing to him, because he can’t buy stuff with it. Where as you seem to be comparing Bob with his counter-factual at each round number—while I am only interested in Bob at the very end of the process, when he walks away with his winnings to get all that utility. But your proposed Bob never walks away from the table with any winnings. (Assuming no round limit). If he still has winnings he doesn’t walk away.
        Lets put details on the scenario in two slightly different ways. (1) the “casino” is just a computer script where Bob can program in a strategy (bet it all every time), and then just type in the number of rounds (N). (Or, for your version of Bob, put the whole thing in a “while my_money > 0:” loop.) We could alternatively (2) imagine that Bob is in the casino playing each round one at a time, and that the time taken doing 1 round is a fixed utility cost of some small number (say 0.1). This doesn’t change anything for utility-maximising-Bob, and in fact the time costs for 1 more round relative to his expected gains shrink over time as his money doubles up. (later rounds are a better deal in expectation).
        With these models I just see a system where Bob deterministically looses all his money. The longer he goes before going bust, the more of his time he wastes as well (in (2)).
        Kelly betting doesn’t actually fix my complaint. A Kelly betting Bob with no point at which they say “Yes, that is enough money, time to leave.” actually gets minus infinity utility in model (2) where doing a round costs a small but finite amount of utility in terms of the time spent. Because the money acquired doesn’t pay off till they leave, which they never do.
        I think maybe you are right that it comes down to the utility function. Any agent (even the Kelly one) will behave in a way that comes across as obviously insane if we allow their utility function to go to infinity. Although I still don’t quite see how that infinity actually ever enters in this specific case. If we answer the infinite utility function with an infinite number of possible rounds then we can say with certainty that Bob never walks away with any winnings.
        philh 31 May 2023 10:33 UTC
        4 points
        0
        Parent
        I agree infinity is what makes things go weird here, but as you say, not particularly weirder for Bob than for Kelly-Betting Bob (who also never leaves the casino, and also wraps in a while my_money > 0 loop).
        
        But what you say here seems to undermine your original comment:
        
        The problem with maximising expected utility is that Bob will sit their playing 1 more round, then another 1 more round again and again until he eventually looses everything.
        
        But KBB also sits there playing one more round, then another round. He doesn’t eventually lose everything, but he doesn’t leave either. This isn’t a problem with maximizing expected utility, it’s a problem with infinity.
        
        At least to me Kelly betting fits in the same kind of space as the Newcomb paradox and (possibly) the prisoners dilemma. They all demonstrate that the optimal policy is not necessarily given by a sequence of optimal actions at every step.
        
        But with this setup, it only demonstrates that if we wave our hands and talk about what happens after playing infinitely many rounds of a game we never want to stop playing.
        
        If we aren’t talking about something like that, then optimal policy for the expected-money maximizer is given by taking the optimal action at every step.
        Ben 31 May 2023 11:54 UTC
        4 points
        0
        Parent
        Yes, my position did indeed shift, as you changed my mind and I thought about it in more depth. My original position was very much pro-Kelly. On thinking about your points I now think it is the while my_money > 0 aspect where the problem really lies. I still stand by the difference between optimal global policy and optimal action at each step distinction, because at each step the optimal policy (for Kelly or not) is to shake the dice another time. But, if this is taken as a policy we arrive at the while my_money > 0 break condition being the only escape, which is clearly a bad policy. (It guarantees that in any world we walk away, we walk away with nothing.)
        philh 31 May 2023 15:19 UTC
        4 points
        0
        Parent
        Nod. I think we basically agree at this point. Certainly I don’t intend to claim that optimal policy and optimal actions always coincide (I have more thoughts on that but don’t want to get into them).