Flipping Out: The Cosmic Coinflip Thought Experiment Is Bad Philosophy
TL;DR: People often use the thought experiment of flipping a coin, giving 50% chance of huge gain and 50% chance of losing everything, to say that maximizing utility is bad. But the real problem is that our intuitions on this topic are terrible, and there’s no real paradox if you adopt the premise in full.
Epistemic status: confident, but too lazy to write out the math
There’s a thought experiment that I’ve sometimes heard as a counterargument to strict utilitarianism. A god/alien/whatever offers to flip a coin. Heads, it slightly-more-than-doubles the expected utility in the world. Tails, it obliterates the universe. An expected-utility maximizer, the argument goes, keeps taking this bet until the universe goes poof. Bad deal.
People seem to love citing this thought experiment when talking about Sam Bankman-Fried. We should have known he was wrong in the head, critics sigh, when he said he’d bet the universe on a coinflip. They have a point; SBF apparently talked about this a lot, and it came up in his trial. I’m not fully convinced he understood the implications, and he certainly had a reckless and toxic attitude towards risk.
But today I’m here to argue that, despite his many, many flaws, SBF got this one right.
There is a lot of value in the universe
Suppose I’m a utilitarian. I value things like the easing of suffering and the flourishing of sapient creatures. Some mischievous all-powerful entity offers me the coinflip deal. On one side is the end of the world. On the other is “slightly more than double everything I value.” What does that actually mean?
It turns out the world is pretty big. There is a lot of flourishing in it. There’s also a lot of suffering, but I happen to arrange my preference-ordering such that the net utility of the world continuing to exist is extremely large. To make this coinflip an appealing trade, the Cosmic Flipper has to offer me something whose value is commensurate to that of the whole entire world and everyone in it, plus all the potential future value in humanity’s light cone.
That’s a big freaking deal.
The number of offers that weigh heavily enough on the other side of the scale is pretty darn small. “Double the number of people in the world” doesn’t begin to come close; neither does “make everyone twice as happy.” A more appropriate offer IMO might look more like “everyone becomes unaging, doesn’t need to eat or drink except for fun, grows two standard deviations smarter and wiser, and is basically immune to suffering.”
That’s a bet I’d at least consider taking. Odds are, you might feel that way too.
(If you don’t, that’s okay, but it means the Cosmic Flipper still isn’t offering you enough. What would need to be on the table for you, personally, to actually consider wagering the fate of the universe on a coinflip? What would the Cosmic Flipper have to offer? How much better does the world have to be, in the “heads” case, that you would be tempted?)
Suppose I do take the bet, and get lucky. How do you double that? Now we’re talking something on the order of “all animals everywhere also stop suffering” and I don’t even know what else.
By the time we get to flipping the coin five, ten, or a hundred times, I literally can’t even conceive of what sort of offer it would take to make a 50% chance of imploding utopia sound like a good price to pay. It’s incredibly difficult to wrap our brains around what “doubling the value in the world” actually means. And that’s just the tip of the iceberg.
We already court apocalypse
The thought experiment gets even more complicated when you factor in existing risks.
If you buy the arguments about threats from artificial superintelligence—which I do, for the record—then our world most likely has only a few years or decades left before we’re eaten by an unaligned machine. If you don’t buy those arguments, there’s still the 1 in 10,000 chance per year that we all nuke ourselves to death (or into the Stone Age), which is similar to the odds that you die this year in a car crash (if you’re in the US). Even if humanity never invents another superweapon, there’s still the chance that Earth gets hit by a meteor or Mother Nature slaughters our civilization with the next Black Death before we get our collective shit together.
What does it mean to “double the expected value of the universe” given the threat of possible extinction? I genuinely don’t know. And we can’t just say “well, holding x-risk constant...” because any change to the world that’s big enough to double its expected utility is going to massively affect the odds of human extinction.
When it comes to thought experiments like this, we can’t just rely on what first pops into our head when we hear the phrase “double expected value.” For the bargain to make sense to a true expected-utility maximizer, it has to still sound like a good deal even after all these considerations are factored in.
Everything breaks down at infinity
OK, so maybe it’s a good idea to flip the coin once or twice, or even many times. But if you take this bet an infinite number of times, then you’re guaranteed to destroy the universe. Right?
Firstly, lots of math breaks down at infinity. Infinity is weird like that. I don’t think there exists a value system that can’t be tied in knots by some contrived thought experiment involving infinite regression, and even if there did, I doubt it would be one I wanted to endorse.
Secondly, and more importantly, I question whether it is possible even in theory to produce infinite expected value. At some point you’ve created every possible flourishing mind in every conceivable permutation of eudaimonia, satisfaction, and bliss, and the added value of another instance of any of them is basically nil. In reality I would expect to reach a point where the universe is so damn good that there is literally nothing the Cosmic Flipper could offer me that would be worth risking it all.
And given the nature of exponential growth, it probably wouldn’t even take that many flips to get to “the universe is approximately perfect”. Sounds like a pretty good deal.
Conclusion
The point I’m hoping to make is that this coinflip thought experiment suffers from a gap between the mathematical ideal of “maximizing the expected value in the universe” and our intuitions about it.
On a more specific level, I wish people would stop saying “Of course SBF had a terrible understanding of risk, he took EV seriously!” as though SBF’s primary failing was being a utilitarian, and not being reckless and hopelessly blinkered about the real-world consequences of his actions.
“Would you destroy a better world to save this one?”
From Ada Palmer’s Terra Ignota, that might be an interesting reframing of this wager: you are destroying (the chance of) a better world, more than twice as good as this one, to guarantee the survival of this one.
That is an interesting reframing of this wager!
I think you’ve made a motte-and-bailey argument:
Motte: The payoff structure of the cosmic flip/St. Petersburg Paradox applied to the real world is actually much better than double-or-nothing, and therefore you should play the game.
Bailey: SBF was correct in saying you should play the double-or-nothing St. Petersburg Paradox game.
Your motte is definitely defensible. Obviously, you can alter the payoff structure of the game to a point where you should play it.
That does not mean “there’s no real paradox” , it just means you are no longer talking about the paradox. SBF literally said he would take the game in the specific case where the game was double-or-nothing. Totally different!
This ends my issue with your argument, but I’ll also share my favorite anti-St. Petersburg Paradox argument since you didn’t really touch on any of the issues it connects to. In short: the definition of expected value as the mean outcome is inappropriate in this scenario and we should instead use the median outcome.
This paper makes the argument better than I can if you’re curious, but here’s my concise summary:
Mean values are perhaps appropriate if we play the game many (or infinity) times. In these situations, through the law of large numbers, the mean outcome of the games played will approach the mean interpretation of expected value.
For a single play-through (as in the thought experiment) the mean is not appropriate, as the law of large numbers does not apply. Instead, we should value the game by its median outcome: the outcome one should reasonably expect.
Indeed, if you have people actually play this game, their betting behavior is more consistent with an intuition of median expected value (this is tested in the paper).
There’s an argument Median EV is the better interpretation even when playing multiple times. In these situations you can think of the game as “playing the game multiple times, once.” This resolves the paradox in all but the infinite cases.
If you use the median interpretation of EV for finite trials of the game, there is no paradox.
A personal gripe: I find it more than a little stupid that the “expected value” is a value you don’t actually “expect” to observe very frequently when sampling highly skewed distributions.
Mathematicians and Economists have taken issue with the mean definition of EV basically as long as it has existed. Regardless of whether or not you agree with it, it seems pretty obvious to me that it is inappropriate to use the mean to value single trial outcomes.
So maybe in the real world we should play the game, but I firmly believe we should value the game using medians and not means. Do we get to play the world outcome optimization game multiple/infinite times? Obviously not.
I don’t actually mean the thing you’re calling the motte at all, and I’m not sure I agree with the bailey either. The thought experiment as I understand it was never quite a St. Petersburg Paradox because both the payout (“double universe value”) and the method of choosing how to play (single initial payment vs repeated choice betting everything each time) are different. It also can’t literally be applied to the real world at all, part of the point is that I don’t even know what it would look like for this scenario to be possible in the real world, there are too many other considerations at play.
In the case I’m imagining, the Cosmic Flipper figures out whatever value you currently place on the universe—including your estimated future value—and slightly-more-than-doubles it. Then they offer the coinflip with the tails-case being “destroy the universe.” It’s defined specifically as double-or-nothing, technically slightly better than double-or-nothing, and is therefore worth taking to a utilitarian in a vacuum. If the Cosmic Flipper is offering a different deal then of course you analyze it differently, but that’s not what I understood the scenario to be when I wrote my post.
This very much depends on the rate of growth.
For most human beings, this is probably right, because their values have a function that grows slower than logarithmic, which leads to bounds on the utility even assuming infinite consumption.
But it’s definitely possible in theory to generate utility functions that have infinite expected utility from infinite consumption.
You are however pointing to something very real here, and that’s the fact that utility theory loses a lot of it’s niceness in the infinite realm, and while there might be something like a utility theory that can handle infinity, it will have to lose a lot of very nice properties that it had in the finite case.
See these 2 posts by Paul Christiano for why:
https://www.lesswrong.com/posts/hbmsW2k9DxED5Z4eJ/impossibility-results-for-unbounded-utilities
https://www.lesswrong.com/posts/gJxHRxnuFudzBFPuu/better-impossibility-result-for-unbounded-utilities
Growing slower than logarithmic does not help. Only being bounded in the limit gives you, well, a bound in the limit.
“Bounded utility solves none of the problems of unbounded utility.” Thus the title of something I’m working on, on and off.
It’s not ready yet. For a foretaste, some of the points it will make can be found in an earlier unpublished paper “Unbounded Utility and Axiomatic Foundations”, section 3.
The reason that bounded utility does not help is that any problem that arises at infinity will already practically arise at a sufficiently large finite stage. Repeated plays of the finite games discussed in that paper will eventually give you a payoff that has a high probability of being close (in relative terms) to the expected value. But the time it takes for this to happen grows exponentially with the lengths of the individual games. You are unlikely to ever see your theoretically expected value, however long you play. The infinite game is non-ergodic; the game truncated to finitely many steps and finite payoffs is ergodic only on impractical timescales.
Infinitude in problems like these is better understood as an approximation to the finite, rather than the other way round. (There’s a blog post by Terry Tao on this theme, but I’ve lost the reference to it.) The problems at infinity point to problems with the finite.
Thanks for catching that error, I did not realize this.
I think I got it from here:
https://www.lesswrong.com/posts/EhHdZ5yBgEvLLx6Pw/chad-jones-paper-modeling-ai-and-x-risk-vs-growth
I definitely agree that the problems of infinite utilities are approximately preserved by the finitary version of the problem, and while there are situations where you can get niceness assuming utilities are bounded (conditional on giving players exponentially large lifespans), it’s not the common or typical case.
Infinity makes things worse in that you no longer get any cases where nice properties like ergodicity or dominance are consistent with other properties, but yeah the finitary version is only a little better.
The thought experiment is not about the idea that your VNM utility could theoretically be doubled, but instead about rejecting diminishing returns to actual matter and energy in the universe. SBF said he would flip with a 51% of doubling the universe’s size (or creating a duplicate universe) and 49% of destroying the current universe. Taking this bet requires a stronger commitment to utilitarianism than most people are comfortable with; your utility needs to be linear in matter and energy. You must be the kind of person that would take a 0.001% chance of colonizing the universe over a 100% chance of colonizing merely a thousand galaxies. SBF also said he would flip repeatedly, indicating that he didn’t believe in any sort of bound to utility.
This is not necessarily crazy—I think Nate Soares has a similar belief—but it’s philosophically fraught. You need to contend with the unbounded utility paradoxes, and also philosophical issues: what if consciousness is information patterns that become redundant when duplicated, so that only the first universe “counts” morally?
Did he really? If true, that’s actually much dumber than I thought, but I couldn’t find anything saying that when I looked.
I wouldn’t characterize that as a “commitment to utilitarianism”, though; you can be a perfect utilitarian and have value that is linear in matter and energy (and presumably number of people?), or be a perfect utilitarian and have some other value function.
The possible redundancy of conscious patterns was one of the things I was thinking about when I wrote:
I think you can do some steelmanning of the anti-flippers with something like Lara Buchak’s arguments on risk and rationality. Then you’d be replacing the vague “the utility maximizing policy seems bad” argument with a more concrete “I want to do population ethics over the multiverse” argument.
Alas, I am not familiar with Lara Buchak’s arguments, and the high-level summary I can get from Googling them isn’t sufficient to tell me how it’s supposed to capture something utility maximizing can’t. Was there a specific argument you had in mind?
Are you familiar with Kelly betting? The point of maximizing log expectation instead of pure expectation isn’t because happiness grows on a logarithmic scale or whatever, it’s for the sake of maximizing long-term expected value. This kills off making bets where “0” is on the table (as log(0) is minus infinity); whether or not that’s appropriate is still an interesting topic for discussion because, as you mentioned, x-risks exist anyway
Kelly betting does not maximize long-term expected value in all situations. For example, if some bets are offered only once (or even a finite amount), then you can get better long-term expected utility by sometimes accepting bets with a potential “0”-Utility outcome.
Heard of it, but this particular application is new. There’s a difference, though, between “this formula can be a useful strategy to get more value” and “this formula accurately reflects my true reflectively endorsed value function.”
I’m dead sure you’d need more than ‘just more than a doubling’ for the payoff to make sense. Let’s assume two things.
Net utility naturally doubles for humans roughly every 300,000 years. (This is deliberately conservative, recent history would suggest something much faster, but the numbers are so stupidly asymmetric, using recent history would be silly. Homo Sapiens have been around that long, net utility has doubled at least once in that time)
The universe will experience heat death in roughly 10^100 years.
Before you even try to factor in the volatility costs, time value of enjoying that utility, etc. your payoff has to be something like 2^10^95.
Edit, alright since apparently we’re having trouble with this argument, let’s clarify it.
It’s not good enough for a bet to “make sense,” in some isolated fashion. You have to evaluate the opportunity cost of what you could have done with the thing you’re betting instead. My original comment was suggesting a method to evaluate that opportunity cost.
The post makes this weird “if the utility was just big enough” move, while still attempting to justify the original, incredibly stupid bet. It’s a bet. Pick a payoff scheme, and the math works, or it doesn’t, when compared to some opportunity cost, not some nonsensical bet from nowhere. Saying that the universe is big, and valuable, and vaguely pointing at a valuation method, but then pointing to “but just make the payout bigger” misses the point. Humans are bad at evaluating such structures, and using them to build your moral theories has issues.
For the coinflip to make sense, your opportunity cost has to approach zero. Give any reasonable argument that the universe existing has an opportunity cost approaching zero, and the bet gets interesting.
But almost any valuation method you pick for the universe gets to absurdly high ongoing value. That isn’t a Pascal’s Mugging, that’s deciding to bet the universe.
Here’s how you get to opportunity cost near zero:
X-Risk greater than 50%
Humans are the only sapient species. (Getting past X-risk gets evaluated per species, the universal coinflip gets evaluated for the universe. That changes the math.)
Your certainty on both 1 and 2 are so high that your fudge factor doesn’t play with the fact you know the odds of coinflips.
If any of those three is not true, you can’t get a low enough opportunity cost to justify the coinflip. That still might not be enough, but you’re at least getting into the ballpark of having a discussion about the bet being sensible.
If anyone wants to argue, instead of downvoting, I’d take the argument. Maybe I’m missing something. But it’s just a stupid bet without some method of evaluating opportunity cost. Pick one.
I’m assuming the Cosmic Flipper is offering, not a doubling of the universe’s current value, but a doubling of its current expected value (including whatever you think the future is worth) plus a little more. If it’s just doubling current niceness or something, then yeah, that’s not nearly enough.
I’d missed that, thank you for pointing that out.
If “expected” effectively means what you’re saying is that you’re being offered a bet that is good by definition, that even at 50⁄50 odds, you take the bet, I suppose that’s true. If the bet is static for a second flip, it wouldn’t be a good deal, but if it dynamically altered such that it was once again a good bet by definition, I suppose you keep taking the bet.
If you’re engaging with the uncertainty that people are bad at evaluating things like “expected utility” then at least some of the point is that our naive intuitions are probably missing some of the math, and costs, and the bet is likely a bad bet.
If I was trying to give credence to that second possibility, I’d say that the word “expected” is now doing a bunch of hidden heavy lifting in the payoff structure, and you don’t really know what lifting it’s doing.