Consider Reconsidering Pascal’s Mugging
[Epistemic Status: confident considering the outside view]
The Pascal’s Mugging dilemma is this: a random person walks up to you in the street and says that if you don’t give them a dollar, they’ll destroy the earth tomorrow. Do you pay? Since the probability of them doing so cannot plausibly be low enough to make giving the dollar negative expected utility, decision theory (whether causal, evidential, or functional) says you do.
The dominant view is
Paying is obviously wrong. Therefore, this is a yet unsolved problem. There is probably a theoretical insight by which Decision Theory should be amended to avoid this behavior.
But this has never made sense to me. My view has always been
Decision Theory is almost certainly going to produce the correct output on this problem because it is conceptually simple. Therefore, either there are reasons why decision theory doesn’t actually tell you to pay, or paying is correct.
It’s strange to me that the intuition “you shouldn’t pay” is apparently valued much more highly than the intuition “FDT is correct,” to the point that the idea that it could possibly be correct to pay isn’t even on the table. So far, I have never read a satisfying way to deal with this problem, which strengthens my suspicion that there is none. Moreover, the ideas which I have seen mostly seem very misguided to me, particularly anything that involves rounding small numbers to zero or treating utility as non-linear. Therefore-
Objection!
… yes?
If your decision theory pays, then you can be exploited heavily, by being mugged repeatedly.
No I can’t. The probability that the mugger is telling the truth doesn’t plausibly increase if they ask multiple times, and the cost of being mugged repeatedly is high. A single dollar already has a chance to save the earth in other ways, and that chance increases at a roughly linear pace with repeated asks, both in the literal case as a human and in the metaphorical case as an aspiring AI. Finally, there is the fact that appearing muggable is negative utility.
Yeah, but that’s just a hack, and if your argument relies on this, then that’s terrible.
Why?
I don’t personally feel uncomfortable paying, or having an AI that would pay in such a scenario. As I said, the probability doesn’t increase with repeated asks, so that’s very similar to just asking for more in the first place.
So what if they do ask for more in the first place? If the same homeless person asked you for 200$, would you still pay?
I would, but only because the fear of being responsible for someone else’s death would impact me personally. Otherwise, I don’t think it’d be correct. If they threaten to destroy the earth, I wouldn’t pay. I think giving the 200$ to Miri has better odds of doing that.
You’ve been avoiding part of the problem. What about really high numbers? Don’t your explanations break apart there?
I don’t think so. I hold that the chance to save a googolplex people is also higher by donating than by paying. When people discuss this issue, noting that paying can have arbitrarily high payoffs, they always forget that other ways of spending the money can also have arbitrarily high payoffs. I don’t think this changes in the case of an AI, either. Yes, there is always some chance that the AI is misprogrammed in a way that precludes it from seeing how the mugger could save a googolplex people. But there is also a chance that giving up whatever resources the mugger asked for will prevent it from figuring that out itself.
What about infinite payoffs?
I’m not sure. I think that is a separate issue, and I want to explicitly exclude it from this post.
And what about if the sum asked is sufficiently small, huh?
Then you pay.
I realize there aren’t any arguments here that anyone else probably couldn’t come up with in 5 minutes. However, as it stands, no-one else is making them. It just seems really obvious to me that this is talking in circles: either there is a reason why FDT wouldn’t pay, or it is correct to pay. Like, come on! If paying the mugger is -actually- the highest utility option you have, then why wouldn’t you take it? Doesn’t that seem weird? I find it weird, much weirder than the idea that paying might sometimes be correct. I think it is useful to look at a mugging scenario as simply providing you with an additional option to spend money. If there is a better option, ignore it. If not, then there is no reason not to take it.
Another thing that I’ve never seen anyone point out is that the total amount of damage caused by being mugged seems to be naturally bounded above. It doesn’t matter how high the utility at stake gets, there is always some amount of resources that will have a better chance of gaining as much utility by being used in other ways. It makes no sense to consider the maximum utility bounded in the latter case but not in the former. The mugger claiming that they can affect a googolplexplex lives doesn’t give them exclusive access to a non-zero probability of affecting a googolplexplex lives; other ways do exist. It will never be positive expected utility to pay a really significant amount of resources in response to threats, because at some point, the probability that further resources will help negotiating with people running the simulation just takes over, for any arbitrarily large number.
Pascal’s Mugger has never seemed to be anything else but “hey, here is an unintuitive result of correct decision theory” to me, and I believe the correct response would be to say “okay, interesting” and move on.
If you have an unbounded utility function and a broad prior, then expected utility calculations don’t converge. It’s not that decision theory is producing an answer and we are rejecting it—decision theory isn’t saying anything. This paper by Peter de Blanc makes the argument. The unbounded cases is just as bad as the infinite case. Put a different way, the argument for representing preferences by utility functions doesn’t go through in cases where there are infinitely many possible outcomes.
That said, most people report that they wouldn’t make the trade, from which we can conclude that their utility functions are bounded, and so we don’t even have to worry about any of this.
I think that the argument you give—that there are much better ways of securing very large payoffs—is also an important part of making intuitive sense of the picture. The mugger was only ever a metaphor, there is no plausible view on which you’d actually pay. If you are sloppy you might conclude that there is some mugger with expected returns twice as large as whatever other plausible use of the money you are considering, but of course all of this is just an artifact of rearranging divergent sums.
That is the core of what replying to zulu’s post made me think.
I won’t say too much more until I read up on more existent thoughts, but I as of now I strongly object to this
I neither think that the conclusion follows, nor that utility functions should ever be bounded. We need another way to model this.
Send me $5 or I will destroy the universe. paypal.me/arizerner
Thank you for paying. I will not destroy the universe, nor will I issue similar threats against you in the future. In addition, you have demonstrated the admirable quality of willingness to put your money where your mouth is.
Why do you think that? What is the probability that the mugger does in fact have exclusive access to 3^^^^3 lives? And what is the probability for 3^^^^^3 lives?
By the way, what happens if a billion independent muggers all mug you for 1 dollar, one after another?
The same as if one mugger asks a billion times, I believe. Do you think the probability that a mugger is telling the truth is a billion times as high in the world where 1000 000 000 of them ask the AI versus the world where just 1 asks? If the answer is no, then why would the AI think so?
In the section you quoted, I am not saying that other ways of affecting 3^^^^3 lives exist, I am saying that other ways with a non-zero probability of affecting that many lives exist – this is trivial, I think. A way to actually do this does, most likely, not exist.
So there is of course a probability that the mugger does have exclusive access to 3^^^^3 lives. Let’s call that p. What I am arguing is that it is wrong to assign a fairly low utility u to 1$ worth of resources and
then conclude “aha, since p⋅U(3^^^^3 lives) >u, it must be correct to pay!” And the reason for this is that u is not actually small. Calculating u, the utility of one dollar, does itself include considering various mugging-like scenarios; what if there is just a bit of additional self-improvement necessary to see how 3^^^^3 lives can be saved? It is up to the discretion of the AI to decide when the above formula holds.
So p might be much larger than 1/3^^^^3 but u is actually very large, too. (I am usually a fan of making up specific numbers, but in this case that doesn’t seem useful).
I think you really should. I asked you to compare P(mugger can save 3^^^^3 lives) with P(mugger can save 3^^^^^3 lives). The second probability should be only slightly lower than the first, it can’t possibly be 3^^^^3/3^^^^^3 as low, because if you’re talking to an omnipotent matrixlord, the number of arrows means nothing to them. So it doesn’t matter how big u is, with enough arrows P(mugger can save N lives) times U(N lives) is going to catch up.
What does “low utility” mean? 1$ presumably has 10 times less utility than the 10$ that I have in my pocket right now, and it’s much lower than U(1 life), so it’s clearly not the most useful thing in the world, but aside from that, there isn’t much to say. The scale of utilities as a “0″, but the choice of “1” is arbitrary. Everything is high or low only in comparison to other things.
The muggers may or may not be independent. It’s possible that they each of them has independent powers to save a different set of 3^^^^3 lives. It also possible that all of them are lying, but P(a billion people are all lying) is surely much lower than P(one person is lying). I could imagine why you still wouldn’t pay, but if you did the math, the numbers would be very different from just one person asking a billion times.
What happened since is neither one or the other, which is why I found it tricky to decide what to do. Basically it seems to me that the everything just comes down to the fact that expected utilities don’t converge. Every response I’d have to your arguments would run into that wall. This seems like an incredibly relevant and serious problem that throws a wrench into all of these kinds of discussions, and Pascal’s Mugging seems like merely a symptom of it.
So basically my view changed from “There’s no fire here” to “expected utilities don’t converge holy shit why doesn’t everyone point this out immediately?” But I don’t see PM as showcasing any problem independent from that, and find the way I head it talked about before pretty strange.
Thank you for that post. The way I phrased this clearly misses these objections. Rather than addressing them here, I think I’ll make a part 2 where I explain exactly how I think about these points… or, alternatively, realize you’ve convinced me in the process (in that case I’ll reply here again).