MorgneticField comments on Kelly betting vs expectation maximization

MorgneticField 31 May 2023 1:29 UTC
1 point
0
Since writing the original post, I’ve found Gwern’s post about a solution to something almost identical to Bob’s problem. In this post, he creates a decision tree for every possible move starting from the first one, determining final value at the leaf nodes. He then uses the Bellman equation and traditional expected value to back out what you should do in the earliest moves. The answer is that you bet approximately Kelly.

Gwern’s takeaway here is (I think) that expected value always works, but you have to make sure you’re solving the right problem. Using expected value naively at each step, discounting the temporal nature of the problem, leads to ruin.

I think many of the more philosophical points in my original post still stand, as doing backwards induction even on this toy problem is pretty difficult (it took his software 16 hours to find the solution). Collapsing a time series expected value problem to a one-shot Kelly problem saves a lot of effort, but to do that you need an ergodic statistic. Even once you’ve done that, you should still make sure the game is worth playing before you actually start betting.
- gwern 31 May 2023 2:36 UTC
  4 points
  0
  Parent
  
  it took his software 16 hours to find the solution
  
  That’s just the maximally-inefficient-but-convenient interpreted version in R. For the Kelly Coin Flip Game, the fastest exact brute-force was 0.002h, not 16.000h, and it’d probably be less than half that if I ran it on my current 16-core machine instead of my laptop from 9 years ago. (For comparison, Feep & others got a similar speedup on another dynamic programming problem: taking it from the naive interpreted version of erroring out at problem sizes much past 300 due to memory usage problems to being able to solve problem sizes up to 133,787,000 in just 9 wallclock days. Quite something. And probably some of the tricks in the second problem could’ve been applied to speed up the first one even more.) And the real answer is that it takes 0.000h because Arthur found an exact formula which uses so few operations that I wasn’t sure how to benchmark it meaningfully beyond “seems to run in milliseconds” & so fast it looked like memoizing was slowing it down. (The original problem being too fast to compute is why I started making it harder by generalizing the problem.)
  
  As usual, the convenient way to implement something is very rarely anywhere near the fastest, often by multiple orders of magnitude, and we must choose our poison: “fast, easy, general—pick two”.
  
  I have no problem with the argument that ergodic formulas may be the limit of or provably identical to straightforward decision theory/reinforcement learning utility maximization over the actual decision problems rather than simplified strawmen, and may be convenient computational shortcuts. I just don’t find that very useful when relevant problems are finite enough that you lose a lot (eg in the coin-flip problem, KC loses a pretty substantial amount of money because even 300 rounds/years is still not enough for the convergence & often you need to act wildly different from KC), and often break the assumptions, and the ergodic stuff obscures all of this, completely ignoring what it’s a special-case of, and comes with a whole heap of puffery and PR.
  - MorgneticField 31 May 2023 10:49 UTC
    0 points
    0
    Parent
    Arthur found an exact formula which uses so few operations that I wasn’t sure how to benchmark it meaningfully
    Oh, cool. I’ll have to read your post again more carefully.
    rather than simplified strawmen
    Myopic expectation maximization may be a bad argument, but I don’t think it’s a strawman. People do believe that you should expectation maximize on each step of a coin-flipping game, instead over the full history of the game. They act on that belief and go bust, like 30% of the players in Haghani & Dewey. Those people would actually do better adopting an ergodic statistic.
    I now understand that Bellman based RL learns a value function that ends up maximizing expected value over a history instead of myopically. That doesn’t mean that any AI agent using expectation maximization will do this. In particular, I worry that people will wrap a world model in naive expectation maximization and end up with an agent that goes bust in resources. This seems like something people are actually trying to do with LLMs.
    - gwern 31 May 2023 14:23 UTC
      2 points
      0
      Parent
      
      Oh, cool. I’ll have to read your post again more carefully.
      
      Yeah, it’s one of those ‘kitchen sink’-type posts. The point is less any individual result than creating a zoo of ‘here are some of the many ways to tackle the problem, and what exotic flora & fauna we observe along the way’. You don’t get the effect if you just look at one or two points.
      
      They act on that belief and go bust, like 30% of the players in Haghani & Dewey. Those people would actually do better adopting an ergodic statistic.
      
      Well, they go bust, yes, and would do better with almost any other strategy (since you can’t do worse than winning $0). But I don’t recall Haghani & Dewey saying that the 30%-busters were all doing greedy EV maximization and betting their entire bankroll at each timestep...? (There are many ways to overbet which are not greedy EV maximization.)
      
      In particular, I worry that people will wrap a world model in naive expectation maximization and end up with an agent that goes bust in resources. This seems like something people are actually trying to do with LLMs.
      
      Inasmuch as they are imitation-learning from humans and planning, that seems like less of a concern in the long run. However, to the extent that there is any fundamental tendency towards myopia, that might be a good thing for safety. Inducing various kinds of ‘myopia’ has been a perennial proposal for AI safety: if the AI isn’t planning out sufficiently long-term because eg it has a very high discount rate, then that reduces a lot of instrumental convergence pressure or reward-hacking potential—because all of that misbehavior is outside the planning window. (An ‘oracle AI’ can be seen as an extreme version where it cares about only the next time-step, in which it returns an answer.)
      - MorgneticField 31 May 2023 20:43 UTC
        1 point
        0
        Parent
        If we’re already sacrificing max utility to create a myopic agent that’s lower risk, why would we not also want it to maximize temporal average rather than ensemble average to reduce wipeout risk?
- Oscar_Cunningham 31 May 2023 13:59 UTC
  1 point
  0
  Parent
  The answer is that you bet approximately Kelly.
  No, it isn’t. Gwern never says that anywhere, and it’s not true. This is a good example of what I’m saying.
  For clarity the game is this. You start with $25 and you can bet any multiple of $0.01 up to the amount you have. A coin is flipped with a ⁶⁰⁄₄₀ bias in your favour. If you win you double the amount you bet, otherwise you lose it. There is a cap of $250, so after each bet you lose any money over this amount (so in fact you should never make a bet that could take you over). This continues for 300 rounds.
  Bob’s edge is 20%, so the Kelly criterion would recommend that he bets $5. If he continues to use the Kelly criterion in every round (except if this would take him over the cap, in which case he bets to take him to the cap) he ends with an average of $238.04.
  As explained on the page you link to, the optimal strategy and expected value can be calculated inductively based on the number of bets remaining. The optimal starting bet is $1.99, and if you continue to bet optimally your average amount of money is $246.61.
  So in this game the optimal starting bet is only 20% of the Kelly bet. The Kelly strategy bets too riskily, and leaves $8.57 on the table compared to the optimal strategy.
  Kelly isn’t optimal in any limit either. As the number of rounds goes to infinity, the optimal strategy is to bet just $0.01, since this maximises the likelihood of never going bankrupt. If instead the cap goes to infinity then the optimal strategy is to bet everything on every round. Of course you could tune the cap and the number of rounds together so that Kelly was optimal on the first bet, but then it still wouldn’t be optimal for subsequent bets.
  (EDIT: It’s actually not certain that the optimal strategy in the first round is $1.99, since floating point accuracy in the computations becomes relevant and many starting bets give the same result. But $5 is so far from optimum that it genuinely did give a lower expected value, so we can say for certain that Kelly is not optimal.)
  - MorgneticField 31 May 2023 22:11 UTC
    1 point
    0
    Parent
    Hmm. I think we might be misunderstanding each other here.
    When I say Gwern’s post leads to “approximately Kelly”, I’m not trying to say it’s exactly Kelly. I’m not even trying to say that it converges to Kelly. I’m trying to say that it’s much closer to Kelly than it is to myopic expectation maximization.
    Similarly, I’m not trying to say that Kelly maximizes expected value. I am trying to say that expected value doesn’t summarize wipeout risk in a way that is intuitive for humans, and that those who expect myopic expected values to persist across a time series of games in situations like this will be very surprised.
    I do think that people making myopic decisions in situation’s like Bob’s should in general bet Kelly instead of expected value maximizing. I think an understanding of what ergodicity is, and whether a statistic is ergodic, helps to explain why. Given this, I also think that it makes sense to ask whether you should be looking for bets that are more ergodic in their ensemble average (like index funds rather than poker).
    In general, I find expectation maximization unsatisfying because I don’t think it deals well with wipeout risk. Reading Ole Peters helped me understand why people were so excited about Kelly, and reading this article by Gwern helped me understand that I had been interpreting expectation maximization in a very limited way in the first place.
    In the limit of infinite bets like Bob’s with no cap, myopic expectation maximization at each step means that most runs will go bankrupt. I don’t find the extremely high returns in the infinitesimally probable regions to make up for that. I’d like a principled way of expressing that which doesn’t rely on having a specific type of utility function, and I think Peters’ ergodicity economics gets most but not all the way there.
    Other than that, I don’t disagree with anything you’ve said.
    - philh 31 May 2023 22:58 UTC
      4 points
      0
      Parent
      
      I don’t find the extremely high returns in the infinitesimally probable regions to make up for that. I’d like a principled way of expressing that which doesn’t rely on having a specific type of utility function
      
      This sounds impossible to me? Like, if we’re talking about agents with a utility function, then either that function is such that extremely high returns make up for extremely low probabilities, or it’s such that they don’t. If they do, there’s no argument you can make that this agent is mistaken, they simply value things differently than you. If you want to argue that the high returns aren’t worth the low probability, you’re going to need to make assumptions about their utility function.
      
      I admit that I don’t know what ergodicity is (and I bounce off the wiki page). But if I put myself in the shoes of Bob whose utility function is linear in money… my anticipation is that he just doesn’t care. Like, you explain what ergodicity to him, and point out that the process he’s following is non-ergodic. And he replies that yes, that’s true; but on the other hand, the process he’s following does optimize his expected money, which is the only thing he cares about. And there’s no ergodic process that maximizes his expected money. So he’s just going to keep on optimizing for the thing he cares about, thanks, and if you want to give up some expected money in exchange for ergodicity, that’s your right.
      - MorgneticField 1 Jun 2023 0:27 UTC
        1 point
        0
        Parent
        It’s not clear to me that it’s impossible, and I think it’s worth exploring the idea further before giving up on it. In particular, I think that saying “optimizing expected money is the thing that Bob cares about” assumes the conclusion. Bob cares about having the most money he can actually get, so I don’t see why he should do the thing that almost-surely leads to bankruptcy. In the limit as the number of bets goes to infinity, the probability of not being bankrupt will converge to 0. It’s weird to me that something of measure 0 probability can swamp the entirety of the rest of the probability.
        philh 1 Jun 2023 7:58 UTC
        4 points
        0
        Parent
        I’d say that “optimizing expected money is the only thing Bob cares about” is an example, not an assumption or conclusion. If you want to argue that agents should care about ergodicity regardless of their utility function, then you need to argue that to the agent whose utility function is linear in money (and has no other terms, which I assumed but didn’t state in the previous comment).
        
        Such an agent is indifferent between a certainty of $10^{25}$ dollars, and a near-certainty of $0$ dollars with a $10^{- 67}$ chance of $10^{92}$ dollars. That’s simply what it means to have that utility function. If you think this agent, in the current hypothetical scenario, should bet Kelly to get ergodicity, then I think you just aren’t taking seriously what it means to have a utility function that’s linear in money.
        
        In the limit as the number of bets goes to infinity
        
        I spoke about limits and infinity in my conversation with Ben, my guess is it’s not worth me rehashing what I said there. Though I will add that I could make someone whose utility is log in money—i.e. someone who’d normally bet Kelly—behave similarly.
        
        Not with quite the same setup. But I can offer them a sequence of bets such that with near-certainty ( $p \to 1$ as $t \to \infty$ ), they’d eventually end up with $0.01 and then stop betting because they’ll under no circumstances risk going down to $0.
        
        These bets can’t be of the form “payout is some fixed multiple of your stake and you get to choose your stake”, but I think it would work if I do “payout is exponential in your stake”. Or I could just say “minimum stake is your entire bankroll minus $0.01”—if I offer high enough payouts each time, they’ll take these bets, over and over, until they’re down to their last cent. Each time they’d prefer a smaller bet for less money, but if I’m not offering that they’d rather take the bet I am offering than not bet at all.
        
        Also,
        
        It’s weird to me that something of measure 0 probability can swamp the entirety of the rest of the probability.
        
        The Dirac delta has this property too, and IIUC it’s a fairly standard tool.
        
        Here were talking something that’s weird in a different way, and perhaps weird in a way that’s harder to deal with. But again I think that’s more because of infinity than because of utility functions that are linear in money.
  - green_leaf 31 May 2023 15:53 UTC
    1 point
    0
    Parent
    If instead the cap goes to infinity then the optimal strategy is to bet everything on every round.
    This isn’t right unless I’m missing something—Kelly provides the fastest growth, while betting everything on every round is almost certain to bankrupt you.
    - philh 31 May 2023 16:20 UTC
      5 points
      0
      Parent
      (e: posted overlapping with Oscar_Cunningham)
      
      If you’re trying to maximize expected money at the end of a fixed number of rounds, you do that by betting everything on every round (and, yes, almost certainly going bankrupt).
      
      If that’s not what you’re trying to do, the optimal strategy is probably something else. But “how do we maximize expected money?” seems to be the question Gwern’s post is exploring. It’s just that with the $250 cap, maximizing expected money seems like a good idea (because you can almost always get close to $250), and with no cap, maximizing expected money seems like a terrible idea (because it gives you a 10^-67 chance of $10^92).
      
      You don’t do Kelly because it’s good at maximizing expected money. You do it (when you do it) because you’re trying to do something other than maximize expected money.
      - green_leaf 31 May 2023 18:42 UTC
        1 point
        0
        Parent
        Oh, I see. Yes, I agree. The idea to maximize the expected money would never occur to me (since that’s not how my utility function works), but I get it now.
    - Oscar_Cunningham 31 May 2023 16:10 UTC
      1 point
      0
      Parent
      It bankrupts you with probability 1 − 0.6^300, but in the other 0.6^300 of cases you get a sweet sweet $25 × 2^300. This nets you an expected $1.42 × 10^25.
      Whereas Kelly betting only has an expected value of $25 × (0.6×1.2 + 0.4×0.8)^300 = $3220637.15.
      Obviously humans don’t have linear utility functions, but my point is that the Kelly criterion still isn’t the right answer when you make the assumptions more realistic. You actually have to do the calculation with the actual utility function.
      - green_leaf 31 May 2023 18:35 UTC
        1 point
        0
        Parent
        ~~So, by optimal, you mean “almost certainly bankrupt you.” Then yes.~~
        ~~My definition of optimal is very different.~~
        Obviously humans don’t have linear utility functions
        I don’t think that’s the only reason—if I value something linearly, I still don’t want to play a game that almost certainly bankrupts me.
        Obviously humans don’t have linear utility functions, but my point is that the Kelly criterion still isn’t the right answer when you make the assumptions more realistic.
        I mean, that’s not obvious—the Kelly criterion gives you, in the example with the game, E(money) = $240, compared to $246.61 with the optimal strategy. That’s really close.
        Oscar_Cunningham 31 May 2023 19:02 UTC
        1 point
        0
        Parent
        
        I don’t think that’s the only reason—if I value something linearly, I still don’t want to play a game that almost certainly bankrupts me.
        
        I still think that’s because you intuitively know that bankruptcy is worse-than-linearly bad for you. If your utility function were truly linear then it’s true by definition that you would trade an arbitrary chance of going bankrupt for a tiny chance of a sufficiently large reward.
        
        I mean, that’s not obvious—the Kelly criterion gives you, in the example with the game, E(money) = $240, compared to $246.61 with the optimal strategy. That’s really close.
        
        Yes, but the game is very easy, so a lot of different strategies get you close to the cap.
        green_leaf 14 Jun 2023 21:00 UTC
        1 point
        0
        Parent
        Yes, but the game is very easy, so a lot of different strategies get you close to the cap.
        I’ve been thinking about it, and I’m not sure if this is the case in the sense you mean it—expected money maximization doesn’t reflect human values at all, white Kelly criterion mostly does, so if we make our assumptions more realistic, it should move us away from expected money maximization and towards the Kelly criterion, as opposed to moving us the other way.