It’s very annoying trying to have this conversation without downvotes. Anyway, here are some sentences.
This is not quite the St. Petersburg paradox; in the St. Petersburg setup, you don’t get to choose when to quit, and the confusion is about how to evaluate an opportunity which apparently has infinite expected value. In this setup the option “always continue playing” has infinite expected value, but even if you toss it out there are still countably many options left, namely “quit playing after N victories,” each of which has higher expected value than the last, and it’s still unclear how to pick between them.
Utility not being linear in money is a red herring here; you can just replace money with utility in the problem directly, as long as your utility function is unbounded. One resolution is to argue that this sort of phenomenon suggests that utility functions ought to be bounded. (One way of concretizing what it means to have an unbounded utility function: you have an unbounded utility function if and only if there is a sequence of outcomes each of which is at least “twice as good” as the previous in the sense that you would prefer a 50% chance of the better outcome and a 50% chance of some fixed outcome to a 100% chance of the worse outcome.)
Thinking about your possible strategies before you start playing this game, there are infinitely many: for every nonnegative integer N, you can choose to stop playing after N rounds, or you can choose to never stop playing. Each strategy is more valuable than the next, and the last strategy has infinite expected value. If you state the question in terms of utilities, that means there’s some sense in which the naive expected utility maximizer is doing the right thing, if it has an unbounded utility function.
On the other hand, the foundational principled argument for taking expected utility maximization seriously as a (arguably toy) model of good decision-making is the vNM theorem, and in the setup of the vNM theorem lotteries (probability distributions over outcomes) always have finite expected utility, because 1) the utility function always takes finite values; an infinite value violates the continuity axiom, and 2) lotteries are only ever over finitely many possible states of the world. In this setup, without a finite bound on the total number of rounds, the possible states of the world are given by possible sequences of coin flips, of which there are uncountably many, and the lottery over them you need to consider to decide how good it would be to never stop playing involves all of them. So, you can either reject the setup because the vNM theorem doesn’t apply to it, or reject the vNM theorem because you want to understand decision making over infinitely many possible outcomes; in the latter case there’s no reason a priori to talk about expected utility maximization. (This point also applies to the St. Petersburg paradox.)
If you want to understand decision making over infinitely many possible outcomes, you run into a much more basic problem which has nothing to do with expected values: suppose I offer you a sequence of possible outcomes, each of which is strictly more valuable than the previous one (and this can happen even with a bounded utility function as long as it takes infinitely many values, although, again, there’s no reason a priori to talk about expected utility maximization in this setting). Which one do you pick?
It’s very annoying trying to have this conversation without downvotes. Anyway, here are some sentences.
This is not quite the St. Petersburg paradox; in the St. Petersburg setup, you don’t get to choose when to quit, and the confusion is about how to evaluate an opportunity which apparently has infinite expected value. In this setup the option “always continue playing” has infinite expected value, but even if you toss it out there are still countably many options left, namely “quit playing after N victories,” each of which has higher expected value than the last, and it’s still unclear how to pick between them.
Utility not being linear in money is a red herring here; you can just replace money with utility in the problem directly, as long as your utility function is unbounded. One resolution is to argue that this sort of phenomenon suggests that utility functions ought to be bounded. (One way of concretizing what it means to have an unbounded utility function: you have an unbounded utility function if and only if there is a sequence of outcomes each of which is at least “twice as good” as the previous in the sense that you would prefer a 50% chance of the better outcome and a 50% chance of some fixed outcome to a 100% chance of the worse outcome.)
Thinking about your possible strategies before you start playing this game, there are infinitely many: for every nonnegative integer N, you can choose to stop playing after N rounds, or you can choose to never stop playing. Each strategy is more valuable than the next, and the last strategy has infinite expected value. If you state the question in terms of utilities, that means there’s some sense in which the naive expected utility maximizer is doing the right thing, if it has an unbounded utility function.
On the other hand, the foundational principled argument for taking expected utility maximization seriously as a (arguably toy) model of good decision-making is the vNM theorem, and in the setup of the vNM theorem lotteries (probability distributions over outcomes) always have finite expected utility, because 1) the utility function always takes finite values; an infinite value violates the continuity axiom, and 2) lotteries are only ever over finitely many possible states of the world. In this setup, without a finite bound on the total number of rounds, the possible states of the world are given by possible sequences of coin flips, of which there are uncountably many, and the lottery over them you need to consider to decide how good it would be to never stop playing involves all of them. So, you can either reject the setup because the vNM theorem doesn’t apply to it, or reject the vNM theorem because you want to understand decision making over infinitely many possible outcomes; in the latter case there’s no reason a priori to talk about expected utility maximization. (This point also applies to the St. Petersburg paradox.)
If you want to understand decision making over infinitely many possible outcomes, you run into a much more basic problem which has nothing to do with expected values: suppose I offer you a sequence of possible outcomes, each of which is strictly more valuable than the previous one (and this can happen even with a bounded utility function as long as it takes infinitely many values, although, again, there’s no reason a priori to talk about expected utility maximization in this setting). Which one do you pick?
Thank you for this clear and useful answer!