A Thought on Pascal’s Mugging
For background, see here.
In a comment on the original Pascal’s mugging post, Nick Tarleton writes:
[Y]ou could replace “kill 3^^^^3 people” with “create 3^^^^3 units of disutility according to your utility function”. (I respectfully suggest that we all start using this form of the problem.)
Michael Vassar has suggested that we should consider any number of identical lives to have the same utility as one life. That could be a solution, as it’s impossible to create 3^^^^3 distinct humans. But, this also is irrelevant to the create-3^^^^3-disutility-units form.
Coming across this again recently, it occurred to me that there might be a way to generalize Vassar’s suggestion in such a way as to deal with Tarleton’s more abstract formulation of the problem. I’m curious about the extent to which folks have thought about this. (Looking further through the comments on the original post, I found essentially the same idea in a comment by g, but it wasn’t discussed further.)
The idea is that the Kolmogorov complexity of “3^^^^3 units of disutility” should be much higher than the Kolmogorov complexity of the number 3^^^^3. That is, the utility function should grow only according to the complexity of the scenario being evaluated, and not (say) linearly in the number of people involved. Furthermore, the domain of the utility function should consist of low-level descriptions of the state of the world, which won’t refer directly to words uttered by muggers, in such a way that a mere discussion of “3^^^^3 units of disutility” by a mugger will not typically be (anywhere near) enough evidence to promote an actual “3^^^^3-disutilon” hypothesis to attention.
This seems to imply that the intuition responsible for the problem is a kind of fake simplicity, ignoring the complexity of value (negative value in this case). A confusion of levels also appears implicated (talking about utility does not itself significantly affect utility; you don’t suddenly make 3^^^^3-disutilon scenarios probable by talking about “3^^^^3 disutilons”).
What do folks think of this? Any obvious problems?
- 12 Jan 2012 18:17 UTC; 4 points) 's comment on On accepting an argument if you have limited computational power. by (
- 30 Mar 2011 17:25 UTC; 2 points) 's comment on Pascal’s Mugging: Tiny Probabilities of Vast Utilities by (
- 12 Dec 2010 15:07 UTC; 2 points) 's comment on A Thought on Pascal’s Mugging by (
- 21 Jul 2011 1:14 UTC; 1 point) 's comment on Tendencies in reflective equilibrium by (
- 8 Jun 2011 2:09 UTC; 1 point) 's comment on St. Petersburg Mugging Implies You Have Bounded Utility by (
- The Generalized Anti-Pascal Principle: Utility Convergence of Infinitesimal Probabilities by 18 Dec 2011 23:47 UTC; -6 points) (
Is your utility function such that there is some scenario for which you assign −3^^^^3 utils? If so, then the Kolmogorov complexity of “3^^^^3 units of disutility” can’t be greater than K(your brain) + K(3^^^^3), since I can write a program to output such a scenario by iterating through all possible scenarios until I find one which your brain assigns −3^^^^3 utils.
A prior of 2^-(K(your brain) + K(3^^^^3)) is not nearly small enough, compared to the utility −3^^^^3, to make this problem go away.
Come to think of it, the problem with this argument is that it assumes that my brain can compute the utility it assigns. But if it’s assigning utility according to Kolmogorov complexity (effectively the proposal in the post), that’s impossible.
The same issue arises with having probability depend on complexity.
Ok, I think in that case my argument doesn’t work. Let me try another approach.
Suppose some stranger appears to you and says that you’re living in a simulated world. Out in the real world there is another simulation that contains 3^^^^3 identical copies of a utopian Earth-like planet plus another 3^^^^3 identical copies of a less utopian (but still pretty good) planet.
Now, if you press this button, you’ll turn X of the utopian planets into copies of the less utopian planet, where X is a 10^100 digit random number. (Note that K(X) is of order 10^100 which is much larger than K(3^^^^3) and so pressing the button would increase the Kolmogorov complexity of that simulated world by about 10^100.)
What does your proposed utility function say you should do (how much would you pay to either press the button or prevent it being pressed), and why?
Utility is monotonic, even though complexity isn’t. (Thus X downgrades out of the 3^^^^3 wouldn’t be as bad as, say, 3^^^3 downgrades.) However, utility is bounded by complexity: the complexity of a scenario with utility N must be at least N. (Asymptotically, of course.)
Probably not, if “you” is interpreted strictly to refer to my current human brain, as opposed to including more complex “enhancements” of the latter.
Given that there’s no definition for the value of a util, arguments about how many utils the universe contains aren’t likely to get anywhere.
So let’s make it easier. Suppose the mugger asks you for $1, or ey’ll destroy the Universe. Suppose we assume the Universe to have 50 quadrillion sapient beings in it, and to last for another 25 billion years ( = 1 billion generations if average aliens have similar generation time to us) if not destroyed. That means the mugger can destroy 50 septillion beings. If we assign an average being’s life as worth $100000, then the mugger can destroy $5 nonillion (= 5 * 10^30).
Given that there have been reasonable worries about ie the LHC destroying the Universe, I think the probability that a person can destroy the universe is rather greater than 1 in 5 nonillion (to explain why it hasn’t been done already, assume the Great Filter comes at the stage of industrialization). I admit that the probability of someone with an LHC-level device being willing to destroy the Universe for the sake of $1 would be vanishingly low, but until today I wouldn’t have thought someone would kill 6,790 people to protest a blog’s comment policy either.
Citation needed.
Looking at:
http://en.wikipedia.org/wiki/Safety_of_particle_collisions_at_the_Large_Hadron_Collider
...the defenders are doing a PR whitewash job. They can’t even bring themselves to mention probabilities!
Maybe its because there would be no point to mentioning probabilities smaller than e^(-10^9) (the evidence you get from the fact that the sun still exists) citation, since humans don’t deal well with small numbers.
But no, “whitewash job” :P
IMO, this is most likely to do with the percieved difference between “no risk” and “some risk”. I am sure the authors were capable of producing a quantitative report—and understand that that is the scientific approach—but sat on any figures they might have had—after being instructed about the presentation of the desired conclusion.
This sounds a bit conspiracy-ey. Any evidence for your claims, e.g. a trend of similar papers using probability assessments rather than just stopping at “these collisions have happened a very large number of times and we ain’t dead yet”?
Risk assessments are commonly quantitave.
Fair enough. So we might have enough data for the analysis. But “are commonly quantitative” isn’t even weak evidence either way—that is to say, this paper being less quantitative doesn’t ring any alarm bells per se, since it’s not unusual. But we can get evidence by looking closer: are qualitative risk assessments more likely to be “instructed about the desired conclusion” than quantitative ones? What complicating variables can we prune out to try and get the causal relationship whitewash->qualitative?
Basically what I’m trying to communicate is that there are two ways you could convince me this was a fraud: you could have better knowledge of the subject matter than me and demonstrate directly how it was a fraud, or you could have detailed evidence on frauds, good enough to overcome my prior probability that this isn’t a fraud. Saying “they were probably able to produce a more quantitative report, but didn’t, so it’s a fraud” is neither.
I never used the term “fraud”. You seem to be reading more into this than was intended. I just think it is funny that an official LHC risk assessment paper presumably designed to reassure fails to come up with any probabilities—and just says: “it’s safe”. To someone like me, that makes it look as though it is primarily a PR exercise.
IIRC, others have observed this before me—though I don’t have the reference handy.
I would classify a supposedly scientific paper that “sat on figures” and “was instructed about the desired conclusion” as a fraud. If you would prefer “whitewash” (a word you did use) instead of “fraud” I would be happy to change in the future.
But the paper was quite a bit longer than “it’s safe,” seemed quite correct (though particle physics isn’t my field), and indeed gave you enough information to calculate approximate probabilities yourself if you wanted to. So to me it looks like you’re judging on only a tiny part of the information you actually have.
Because it doesn’t actually say the words “not greater than 1 in 3E22 and that’s just calculating using the cosmic rays that have hit the earth in the last 4.5E9 years” means it should be ignored?
Uh, what? I think I said “PR exercise”, not “worthless document”.
I am most disappointed the Brian Cox quote didn’t make it into that article. The quote was actually newsworthy, too.
Lifeboat Foundation fits my criteria of “reasonable”, as do some of the commenters here. Even if there’s only a one in a million risk of destroying the world, that’s still equivalent to killing 6,000 people with probability one; potentially destroying the Universe should require even more caution.
There’s not even a one in a million; it’s closer to “But there’s still a chance, right?”
And you’re still dealing in probabilities too small to sensibly calculate in this manner and be saying anything meaningful—“switching on the LHC is equivalent to killing 6,000 people for certain” is a statement that isn’t actually sensible when rendered in English, and I don’t see another way to render in English your calculated result that switching it on is “equivalent to killing 6,000 people with probability one”. But please do enlighten me.
(I realise you’re multiplying 6E9 by 1E-6 and asserting that six billion conceptual millionth-of-a-person slivers equals six thousand actual existing people. “Shut up and multiply” doesn’t stop me balking at this, and that the result says “switching on the LHC is equivalent to killing 6,000 people for certain” seems to constitute a reductio ad absurdum for however one gets there.)
Rees estimated the probability of the LHC destroying the world at 1 in 50 million, and it would be surprising if he were one of the few people in the world without overconfidence bias, or one of the few people in the world who doesn’t underestimate global existential risks.
I assume from the first sentence that you believe an appropriate probability to have for the LHC destroying the world is less than one in a billion. Trusting anyone, even the world scientific consensus, with one in a billion probability, seems excessive to me—the world scientific consensus has been wrong on more than one in every billion issues it thinks it’s sure about. If you’re working not off the world scientific consensus but off your own intuition, that seems even stranger—if, for example, the LHC will destroy the world if and only if strangelets are stable at 10 TeEV, then you just discovered important properties about the stability of strangelets to p = < .000000001 certainty, which seems like the sort of thing you shouldn’t be able to do without any experiments or mathematics. If you’re working off of a general tendency for the world not to be destroyed, well, there were five mass extinction events in the past billion years, so ignoring for the moment the tendency of mass extinctions to take multiple years, that means the probability of a mass extinction beginning in any particular year is about 5/billion. If I were to tell you “The human race will become extinct the year the LHC is switched on”, would you really tell me “Greater than 80% chance it has nothing to do with the LHC” and go about your business?
I am still uncomfortable with the whole “shut up and multiply” concept too. But I think that’s where the “shut up” part comes in. You don’t have to be comfortable with it. You don’t have to like it. But if the math checks out, you just shut up and keep your discomfort to yourself, because math is math and bad things happen when you ignore it.
Here we run into the problem of “garbage in, garbage out.”
He assigned 50% extinction risk for the 21st century in his book. His overall estimates of risk are quite high.
What your probability discussion there seems to me to be saying is “these numbers are too small to think about in any sensible way, let alone calculate.” Trying to think about them closely resembles an argument that the way to deal with technological existential risk is to give up technology and go back to the savannah (caves are too techy).
But the math leads to statements like “switching on the LHC is equivalent to killing 6,000 people for certain”, which seems to constitute a reductio ad absurdum of whatever process led to such a sentence.
(You could justify it philosophically, but you’re likely to get an engineer’s answer: “No it isn’t. Here, I’ll show you. (click) Now, how many individuals did that just kill?”)
One day I would like to open up an inverse casino.
The inverse casino would be full of inverse slot machines. Playing the inverse slot machines costs negative twenty-five cents—that is, each time you pull the bar on the machine, it gives you a free quarter. But once every few thousand bar pulls, you will hit the inverse jackpot, and be required to give the casino several thousand dollars (you will, of course, have signed a contract to comply with this requirement before being allowed to play).
You can also play the inverse lottery. There are ten million inverse lottery tickets, and anyone who takes one will get one dollar. But if your ticket is drawn, you must pay me fifteen million dollars. If you don’t have fifteen million dollars, you will have various horrible punishments happen to you until fifteen million dollars worth of disutility have been extracted from you.
If you believe what you are saying, it seems to me that you should be happy to play the inverse lottery, and believe there is literally no downside. And it seems to me that if you refused, I could give you the engineer’s answer “Look, (buys ticket) - a free dollar, and nothing bad happened to me!”
And if you are willing to play the inverse lottery, then you should be willing to play the regular lottery, unless you believe the laws of probability work differently when applied to different numbers.
The hedge fund industry called. They want their idea of selling far out-of-the-money options back.
Doesn’t this describe the standard response to cars?
Just think of all the low-probability risks cars subsume! Similarly, if you take up smoking you no longer need to worry about radon in your walls, pesticides in your food, air pollution or volcano dust. It’s like a consolidation loan! Only dumber.
Sorry, I don’t understand. Response to cars?
Most of life is structured as a negative lottery. You get in a car, you get where you’re going much faster- but if the roulette ball lands on 00, you’re in the hospital or dead. (If it only lands on 0, then you’re just facing lost time and property.)
And so some people are mildly afraid of cars, but mostly people are just afraid of bad driving or not being in control- the negative lottery aspect of cars is just a fact of life, taken for granted and generally ignored when you turn the key.
The reason I recommend David not play the inverse lottery isn’t because all things that give small rewards for a small probability of great loss are bad, it’s because the inverse lottery (like the regular lottery) is set up so that the expected utility of playing is lower than the expected utility of not playing. An inverse lottery in which the expected utility of playing is better than the expected utility of not playing would be a good bet.
A good argument for driving cars wouldn’t be that an accident could never happen and is ridiculous (which is how I interpret David’s pro-LHC argument) but that the benefits gained from driving cars outweigh the costs.
In the case of your original assertion—that it was reasonable to worry about the risks of the LHC—the argument for the probability of disaster being too small to worry about is that we’re not working out the probability assuming such events have never happened before—we’re working out the probability assuming such events and stronger ones happen all the time, because they do. So very many collisions occur just near Earth of greater energies that this puts a strong upper bound on the chances of disaster occurring in the LHC itself. Even multiplied by 6E9, the number is, as I said, much less like 1E-6 and much more like “but there’s still a chance, right?”
No. No, there really isn’t.
Again, let’s say by “but there’s still a chance” you’re saying the chance of CERN causing an apocalypse scenario is less than one in a billion. You say that “the argument” for this is that such collisions happen near Earth all the time.
Suppose I were to posit that black holes produced by cosmic rays have an acceleration that would lead them to fly through the Earth without harming it, but black holes produced by the LHC would be slower and thus able to destroy the Earth where their cosmic-ray-produced brethren could not.
Suppose I were to tell you that either this above paragraph was the view of a significant part of the relevant physicist community (say, greater than five percent) or that I was bluffing and totally made it up.
I offer you a bet. If I’m bluffing, I’ll give you a dollar. If I’m not, you give me ten thousand dollars. No, you can’t Google it to check. If your utility function isn’t linear with respect to money, I’m happy to lower it to something like 1:1000 instead.
If you don’t take the bet, it means you’re not even sure to ten thousand to one odds that that particular argument holds, which makes it very iffy to use as the lynchpin of an argument for billion to one odds.
I have two dodges for this bet: first, the cost of obtaining a dollar from someone distant to me is higher than a dollar, and second, even if there were 5% of the community that believed that, they would be the mistaken 5% of the community, and so that has no bearing on my belief. I might believe to 1e-10 that the LHC won’t destroy the Earth, but only to 1e-1 that more than 95% of the relevant physicists will have carefully done the relevant calculations before coming to an opinion.
With freshman-level physics, you can get a strong idea you shouldn’t be worried by tiny black holes. With much higher physics, you can calculate it and see that the speed this thing is traveling at isn’t the limiting factor.
The effort of making that bet, and trying to get you to pay up the $1, will be a lot harder than many other ways I could earn $1.
So, the expected utility of the bet, even with 100% certainty of being right, is still negative.
No, it’s a much smaller order of number than that. You’re still starting from “but there’s a chance, right?”
The rest of your post is reasoning from your own ignorance of the specific topic of the LHC, but not from that of everyone else. You appear not to have grasped the point of what I just wrote. Please echo back your understanding of what I wrote.
I understand you as saying that cosmic ray collisions that happen all the time are very similar to the sort of collisions at CERN, and since they don’t cause apocalypses, CERN won’t either. And that because the experiment has been tried before millions of times in the form of cosmic rays, this “CERN won’t either” isn’t on the order of “one in a million” or “one in a billion” but is so vanishingly small that it would be silly to even put a number to it.
Tell me if I understood you correctly and if I did I will try to rephrase my post and my objections to what you said so they are more understandable.
Not millions of times. Not even just billions of times.
From a back of the envelope calculation they’ve been tried >10^16 times a year.
For the past 10^9 years.
That’s 10^25 times
And that’s probably several orders of magnitude low.
So yes, treating it as something with a non-zero probability of destroying the planet is silly.
Especially because every model I’ve seen that says it’d destroy the planet would also have it destroy the sun. Which has 10^4 times the surface area of the Earth, and would have correspondingly more cosmic ray collisions.
Read page 848 of http://arxiv.org/ftp/arxiv/papers/0912/0912.5480.pdf
I’m guessing you weren’t aware of all the technical intricacies of this argument nor the necessity of bringing in white dwarf stars to clinch it. Now, it turns out you got lucky, because white dwarf stars do end out clinching the argument. But if there’s a facet of the argument you don’t understand, or there’s even a tiny possibility there’s a facet of the argument you don’t fully understand, you don’t go saying there’s zero probability.
Voted up, because you raise a good point.
Although I had considered the fact that the LHC reactions are closer to Earth-stationary, I hadn’t actually bothered to try and find out how likely multi-particle production from 10^12ev+ cosmic rays would be, and I wouldn’t even be sure how to calculate that in, in order to find out how likely ~Sol-stationary production events are, starting from very high energy cosmics.
And that this puts a strong upper bound on the chances.
If you multiplied it by the next thousand generations of humans on earth you wouldn’t get 1E-6 of a human life equivalent.
So if you can stop using huge numbers like 1E-9, please do proceed, because you do understand the numbers of calculating costs in human life equivalents better than me!
My problem with what you’ve been writing is not your calculations, but the numbers you’re using. Even if the cost were 6E12 lives, it’s still not worth actually worrying about. You’re demonstrating a comprehensive lack of actual domain knowledge—you literally don’t know the thing you’re talking about—and appear to be trying to compensate for that by leveraging what you do know. This tendency is natural, and it’s the usual first step to trying to quickly understand a new thing, but doesn’t tend to give sensible and useful results and may hurt the heads of those with even very slightly more domain knowledge. (When Kurzweil started talking about the brain and genome in terms of computer science, Myers’ response is best understood as “AAAAAAAAAAA STOP BEING STUPID DAMMIT”.)
On a related issue in the domain, here’s a writeup I liked on the thorny problem of how the hell those laymen found sitting on the bench in a court might try to deal with such issues.
As far as I can tell, everything Yvain has said on this topic is correct. In particular, there is a further possible assumption under which it is not the case that cosmic ray collisions with Earth and the Sun prove LHC black holes would be safe, as you can find spelled out in section 2.2 of this paper by Giddings and Mangano. As Yvain pointed out in a different comment, to plug this hole in the argument requires doing some calculations on white dwarfs and/or neutron stars to find a different bound, which is what Giddings and Mangano spend much of the rest of the paper doing. These calculations, as far as I know, were not actually published until 2008 -- several months after the LHC was originally supposed to go online. It’s my impression that both before and after this analysis was done, most of those arguing the LHC is safe just repeated the simplified argument that had the hole in it; see e.g. Kingreaper in this thread. And while I’d put a very low probability on these calculations being wrong and a very low probability on the LHC destroying the world even if the calculations were wrong, it’s this sort of consideration and not 1 in 10^25 coincidences that ends up dominating the final probability estimate. Then there were all these comments about the LHC causing the end of the world being as unlikely as the LHC producing dragons etc—which if taken literally seem annoyingly wrong because of how the end of the world, unlike dragons, is a convergent result of any event sufficiently upsetting to the physical status quo. So while (just because of the multiple unlikely assumptions required) at any point and especially after the Giddings/Mangano analysis a reasonable observer would have had to put an extremely low probability on existential risk from LHC black holes, the episode still makes me update against trusting domain experts as much on questions that are only 90% about their domain and 10% about some other domain like how to interpret probabilities.
Because of conservation of both momentum and energy, particles coming out of the LHC are no slouch either. So although under extremely hypothetical conditions, stable black holes can exist without the sun being destroyed by cosmic rays, even then you need to add even more hypotheticals to make the LHC dangerous.
Note that their very hypothetical scenario is already discouraged by many orders of magnitude by Occam’s razor. I’m not sure what the simplest theory that doesn’t have black holes radiate but does have pair production near them is, but it’s probably really complicated. And then these guys push it even further by requiring that these black hole-like objects not destroy neutron stars either!
I certainly don’t disagree that there are a number of unlikely hypotheticals here that together are very improbable.
My impression from reading had been that, while the typical black hole that would be created by LHC would have too high momentum relative to Earth, there would be a distribution and with reasonably high probability at least one hole (per year, say) would accidentally have sufficiently low momentum relative to Earth. I can’t immediately find that calculation though.
If P(black holes lose charge | black holes don’t Hawking-radiate) is very low, then it becomes more reasonable to skip over the white dwarf part of the argument. Still, in that case, it seems like an honest summary of the argument would have to mention this point, given that it’s a whole lot less obvious than the point about different momenta. G & M seem to have thought it non-crazy enough to devote a few sections of paper to the possibility.
Even producing a black hole per year is doubtful under our current best guesses, but if one of a few extra-dimension TOEs are right (possible) we could produce them. So there’s sort of no “typical” black hole produced by the LHC.
But you’re right, you could make a low-momentum black hole with some probability if the numbers worked out. I don’t know how to calculate what the rate would be, though—it would probably involve gory details of the particular TOE. 1 per year doesn’t sound crazy, though, if they’re possible.
I don’t know if you’re on board with the Bayesian view of probability, but the way I interpret it, probability is a subjective level of confidence based on our own ignorance. In “reality”, the “probability” that the LHC will destroy the Earth is either 0 or 1 - either it ends up destroying the Earth or it doesn’t—and in fact we know it turned out to be 0. What we mean when we say “probability” is “given my level of ignorance in a subject, how much should I expect different scenarios to happen”.
So when I ask “what is your probability of the LHC destroying the world”, I’m asking “Given what you know about physics, and ignoring that both of us now know the LHC did not destroy the world, how confident should you have been that the LHC would not destroy the world”.
I’m not a particle physicist, and as far as I know neither are you. Both of us lack comprehensive domain knowledge. Both of us have only a medium-level of broad understanding of the basic concepts of particle physics, plus a high level of trust in the conclusion that professional particle physicists have given.
But I’m doing what one is supposed to do with ignorance—which is not say I’m completely totally sure of the subject I’m ignorant about to a certainty of greater than a billion to one. Unless you are hiding a Ph.D in particle physics somewhere, your ignorance is not significantly less than my own, yet you are acting as if you had knowledge beyond that of even the world’s greatest physicists, who are hesitant to attach more than a fifty million to one probability to that estimate.
This is what I meant by offering you the bet—trying to show that you were not, in fact, so good at physics that you could make billion to one probability estimates about it. And this is why I find your argument that I’m ignorant to be such a poor one. Of course I’m ignorant. We both are. But only one of us is pretending to near absolute certainty.
That doesn’t seem to be the case when considering quantum mechanics. If, since the LHC was run, we had counterfactually accrued evidence that a significant proportion of those Many Worlds were destroyed then it would be rather confusing to say that the probability turned out to be 0. This can mostly be avoided by being particularly precise about what we are assigning probabilities to. But once we are taking care to be precise it is clear that the thing that there was ‘ignorance’ about and the thing that we know now to be 0 are not the same thing. (ie. An omniscient being would possibly not have assigned 0 prior to the event.)
And here I was expecting you to actually run the numbers.
I’m not a particle physicist, but I do know quite a bit more about the actual numbers to start a calculation from than you do, because I bothered finding them out, and your citation so far appears to be someone else who didn’t bother finding them out. This is what I mean by “reasoning from ignorance” and “even very slight domain knowledge”.
You did run your numbers assuming events the LHC maximum and greater happen all the time, right?
The probability of the sun not coming up tomorrow is greater than 0, but in any practical sense I’d be a drooling lackwit to waste time calculating it.
I appreciate you’re offering a teachable moment about probability, but you really, really aren’t saying anything useful or sensible about the LHC, as you claimed to be.
So far the only number introduced here has been Rees’ “one in fifty million”. You’ve consistently avoided giving a number, using only the “but there’s still a chance” thing, which in my interpretation you’re using diametrically against its intended meaning (intended meaning is that you can’t just use a binary “there is a chance” versus “there’s not a chance”, you actually have to worry about what the chance is). The only thing you’ve said that suggests any level of familiarity with the subject is mentioning the cosmic ray collisions, which were all over the newspapers during the necessary time period, and most of your comments make me think you’re not familiar with the various arguments that have been put forward that the LHC collisions are in fact different from cosmic ray collisions.
But I don’t actually think that matters. Assuming the prior for the LHC destroying the world before your cosmic-ray arguments and whatever other arguments you want to offer is non-negligible, you’re saying you’re certain to within one in (your probability of LHC destroying world/prior) that all your arguments are correct. Since you seem willing to give arbitrarily low probabilities, I’m sure we could fiddle with the numbers so that you’re saying you think there’s less than a one in a million chance there’s any flaw in your argument, or that you’re applying your argument wrong, or that you missed some good reason why LHC collisions don’t have to be different from cosmic ray collisions, or that the energy of cosmic ray collisions has been consistently overestimated relative to the energy of LHC collisions, or that you’re just having a really bad day and your brain is tired and you don’t realize that the argument doesn’t prove what you think it proves. I believe you’re very smart, and I realize the prior is low, and I realize the arguments against the LHC destroying the world are very good, but predicting a novel situation that some smart people disagree upon in a field you don’t understand to a level greater than one in a billion is just a bad idea.
The “but there’s still a chance” principle only means that you shouldn’t act as if you can keep on believing your argument even when the chance grows ridiculously low. It doesn’t mean that you should never keep a tiny portion of probability mass on “or maybe I’m missing something” to compensate for unknown unknowns.
This discussion is not getting anywhere, so I will let you have the last word and then bow out unless you want to continue by private message.
LHC can be safe despite the argument for its safety being flawed.
I looked back through and see that I indeed did not actually give a number. My sincere apologies for this. Not greater than 1 in 3E22.
(1E31x2E17=2E48 LHC-level collisions since the formation of the Universe, and yet everything we can see in the sky is still there. 3E22 LHC-level collisions, or ~1E6 LHC experimental lifetimes, with Earth itself in 4.5E9 years. “There is no indication that any of these previous ‘LHC experiments’ has ever had any large-scale consequences. The stars in our galaxy and others still exist, and conventional astrophysics can explain all the astrophysical black holes detected. Thus, the continued existence of the Earth and other astronomical bodies can be used to constrain or exclude speculations about possible new particles that might be produced by the LHC.” They do commit the fatal error of assuming that a negligible probability means “impossible”, so therefore the paper should of course be ignored.)
Sorry, I know I said I’d stop, and I will stop after this, but that 3E22 number is just too interesting to leave alone.
The last time humanity was almost destroyed was about 80,000 years ago, when a volcanic eruption reduced the human population below 1,000. So say events that can destroy humanity happen on average every hundred thousand years (conservative assumption, right?). That means the chance of a humanity-destroying event per year is 1⁄100,000. Say 90% of all humanity destroying events can be predicted with at least one day’s notice by eg asteroid monitoring. This leaves hard-to-detect asteroids, sudden volcanic eruptions, weird things like sudden methane release from the ocean, et cetera. So 1⁄1 million years we get an unexpected humanity destroying event. That means the “background rate” of humanity destroying events is 1⁄300 million days.
Suppose Omega told you, the day before the LHC was switched on, that tomorrow humankind would be destroyed. If 1/3E22 were your true probability, you would say “there’s still vastly less than one in a billion chance the apocalypse has anything to do with the LHC, it must just be a coincidence.” Even if you were the LHC project coordinator, you might not even bother to tell them not to switch on the project, because it wasn’t worth the effort it would take to go to the telephone.
Let’s look at it a different way. Suppose a scientist has a one in a thousand chance of having a psychotic break. Now suppose the world’s top physicist, so brilliant as to be literally infallible as long as he is sane, comes up with new calculations that say the LHC will destroy the world. Suppose you ask the world’s best psychiatrist, who also is literally never wrong, whether the physicist is insane, and she says no. If your probability is truly 1/3E22, then it is more likely that both the physicist and the psychiatrist have simultaneously gone insane than that the physicist is correct; what is more, even if you have no other evidence bearing on the sanity of either, your probability should still be less than one in a trillion that the LHC will destroy the Earth.
There was some discussion in LW a while back on how it might be a prediction of anthropic theory that if the LHC destroys the world, improbable occurences will prevent the LHC from working. Suppose the LHC is set up so well that the only thing that could stop it from running is a direct asteroid hit to Geneva, such that if turning the LHC on would destroy the world, we would observe an asteroid strike to Geneva with probability 1. Let’s say a biggish asteroid hits the Earth about once every thousand years (last one was Tunguska), and that each one affects one one-hundredth of the Earth’s surface (Tunguska was much less, but others could be bigger). That means there’s a 1⁄30 million chance of a big asteroid strike to Geneva each day. If your true probability is 1/3E22, you could try to turn the LHC on, have an asteroid strike Geneva the day before, and still have less than a one in a billion chance that the asteroid was anything other than a random asteroid.
In fact, all of these combined do not equal 3E22, so if the world’s top infallible physicist agreed the LHC would destroy the world and was certified sane by an infallible psychiatrist, AND an asteroid struck Geneva the last time you planned to turn the LHC on, AND you know the world will end the day the LHC is activated, then if your real probability was 3E22 you should now (by my calculations) think that, on balance, there’s about a one in three chance the LHC would destroy the world.
This is why I don’t like using numbers like 3E22 as probabilities.
This seems in conflict with http://en.wikipedia.org/wiki/Toba_catastrophe_theory
The estimates there range from 2,000 to 20,000 individuals.
The population may not have been significantly bigger before the eruption:
http://www.physorg.com/news183278038.html
A volcanic eruption is obviously much less likely to threaten humanity’s existence today than when there were only a handful of us in the first place.
Brilliant. Is there any chance I could persuade you to present this as a top level post on the front page? This is a comment I expect to reference when related subjects come up in the future.
I haven’t yet entered this particular discussion, but it is of interest to me, so I hope you won’t mind persisting a bit longer, with a different interlocutor.
May I ask just what your lower bound is on probability estimates?
I can’t, really, because it’s context dependent. If the question was “What is the probability that a program which selects one atom at random from all those in the universe (and is guaranteed by Omega genuinely random) picks this particular phosphorous atom on here the tip of my finger”, then my probability would be much less than 3E22.
Likewise, “destroy the Earth” is a relatively simple occurrence—it just needs a big enough burst of energy or mass or something. If it’s “What is the probability that the LHC will create a hamster in a tutu on top of Big Ben on noon at Christmas Day singing ‘Greensleeves’ while fighting a lightsaber duel with the ghost of Alexander the Great”, then my probability would again be less than 3E22 (at least before I formed this thought—I don’t know if having said it aloud makes the probability that malevolent aliens will enact it go above 1/3E22 or not).
Thanks for the clarification; that’s quite reasonable.
I’ll note, however, that your own arguments (the world’s greatest physicist certified sane by the world’s greatest psychiatrist...) still apply!
The point being that our “counterintuitiveness detector” shouldn’t get to automatically override calculated probabilities, especially in situations that intuition wasn’t designed to handle.
As for the LHC, it’s worth pointing out that potential benefits also have to be factored into the expected utility calculation, a fact which I don’t think I’ve seen mentioned in the current discussion.
Yvain: [...] “What is the probability that the LHC will create a hamster in a tutu on top of Big Ben on noon at Christmas Day singing ‘Greensleeves’ while fighting a lightsaber duel with the ghost of Alexander the Great”, then my probability would again be less than 3E22 (at least before I formed this thought—I don’t know if having said it aloud makes the probability that malevolent aliens will enact it go above 1/3E22 or not).
komponisto: Thanks for the clarification; that’s quite reasonable.
^Awesome :-)
Or in a slightly different variant of the experiment, if your real probability is 1/3E22, if someone reliably told you that in a year from now you’d assign a probability of 1/3E12, you’d have to conclude it was probably because your rationality was going to break down (assuming the probability of such breakdowns isn’t too extremely low).
Okay, now I’m confused, or misunderstanding you.
Starting this discussion, I gave a probability of one in a million. After reading up on the subject further, I found a physicist who said one in fifty million, and am willing to bow to his superior expertise.
Was there only a one in fifty chance my probability would change this much? This doesn’t seem right, because I knew going in that I knew very little about the subject and if you’d asked me whether I expected my probability to change by a factor of at least fifty, I would have said yes (though of course I couldn’t have predicted in which direction).
It seems to me it would be fine for David to believe with high probability that he would get new evidence that would change his probability to 3E12, as long as he believed it equally possible he’d get new evidence that would change it to 3E32
A 1/3e22 probability means you believe there’s a 1/3e22 chance of the event happening.
If you have, for example a 1/1e9 chance of finding evidence that increases that to 1/3e12, then you have a 1/1e9*1/3e12 chance of the event happening.
Which is 1/3e21.
So, in order to be consistent, you must believe that there is, at most, a 1/1e10 chance of you finding evidence that increases the probability to 1/3e12.
At which point, the probability of losing your rationality is obviously higher.
Yes. Yvain’s 1 in 50 million example, on the other hand, is fine, because the probability went down. In a more extreme example, it could have had a 50-50 chance of going down to 0 (dropping by a factor infinity) as long as there had been a 50-50 chance of it doubling. Conservation of expected evidence.
On the one hand, everything you say would be true, if we were assigning consistent probabilities.
On the other hand, I’ve never been able to assign consistent probabilities over the LHC and knowing this hasn’t helped me either.
In several places I’d say you tilt your probability estimates in the most favorable direction to your argument. For example, you underestimate how much evidence the meteorite would give − 1/100th of the earth’s surface destroyed every 1000 years is far too much. There have been 0 humanity-wiping-out events so far, over 1 million-ish years, this does not work out to P=10^-5. In estimating based off of expert opinion you load the intuitive die with “the calculations say” rather than “the physicist says”; calculations are either right or wrong.
I agree that the estimate of 10^-22 is likely too low, but I have a negative reaction to how you’re arguing it.
The post you’re responding to didn’t use 3E22 as a probability. It gave 3E22 as a number of previous experiments.
Now, as the link you cited in this response shows, they’re not necessarily quite identical experiments (although some might result in identical experiments).
But you’re attacking an error which was not made.
“1 in 3E22” was surely a probability. Yvain made a typo at the end of his comment.
Ah, I was mistaken. For some reason I didn’t notice the link.
Are there any alternative colour-schemes for this site? Links seem to show up poorly on blue-backgrounded posts.
I would like you to consider turning your comment into a top-level post. Thanks.
So what was your answer to the original question of if the LHC should be switched on? Your citation to Lifeboat is saying they really think it shouldn’t. I presumed you had posted this because you agreed.
That is: you have numbers yourself now. Are those numbers strong enough for you to seriously advocate the LHC should be switched off and kept off?
If “yes”, what are you doing about it? If “no”, then I don’t understand the point of all the above.
I was kinda hoping you wouldn’t ask that. This whole thing came up because I said it was “reasonable” to worry about the LHC, and I stick to that. But the whole thing seems like a Pascal’s Mugging to me, and I don’t have a perfect answer to that class of problem.
I don’t think it should be switched off now, because its failure to destroy the world so far is even better evidence than the cosmic ray argument that it won’t destroy the world the next time it’s used. But if you’d asked before it was turned on? I guess I would agree with Aleksei Riikonen’s point in one of the other LW threads that this is really the sort of thing that could be done just as well after the Singularity.
But I also agree with Eliezer (I could have avoided this entire discussion if I’d just been able to find that post the first time I looked for it when you asked for a citation!) that in reality I wouldn’t lose sleep over it. Basically, I notice I am confused, and my only objection to you was the suggestion that reasonable people couldn’t worry about it, not that I have any great idea how to address the issue myself.
You mean, asking the whole actual real-life question at hand: whether the LHC is too risky to run.
“Is it reasonable to think X?” is only a useful question to consider in relation to X as part of the actual discussion of X. It’s not a useful sort of question in itself until it’s applied to something. Without considering the X itself, it’s a question about philosophy, not about the X. If you’re going to claim something about the LHC, I expect you to be saying something useful about the LHC itself.
Given you appear to regard application as a question you’d rather not have asked, what expected usefulness should I now assign to going through your comments on the subject in close detail, trying to understand them?
(I really am going WHAT? WHAT THE HELL WAS THE ACTUAL POINT OF ALL THAT, THAT WAS WORTH BOTHERING WITH? If you’re going to claim something about the LHC, I expect you to be saying something useful about the LHC itself.)
Yvain was, I suspect, trying to illustrate failures in your thought, rather than in your conclusion.
If you see someone arguing that dogs are mammals because they have tongues, you may choose to correct them, despite agreeing with their conclusion. Especially if you’re on a board related to rationality.
You don’t think an argument that something which you thought was certain is actually confusing is valuable? If an agnostic convinced a fundamentalist that God’s existence was less cut-and-dried obvious than the fundamentalist had always thought, but admitted ey wasn’t really sure about the God question emself, wouldn’t that still be a useful service?
This reads to me as an admission that you were not, nor were you intending to, at any point say anything useful or interesting about the LHC. This suggests that if you want people not to feel like you’re wasting their time and leading them on a merry dance rather than talking about the apparent topic of discussion (which is how I feel now—well and properly trolled. Well done.) then you may want to pick examples where you don’t have to hope no-one ever asks “so what is the point of all this bloviating?”
You asked for a citation for my mention that worrying about the LHC was “reasonable”. I interpreted “reasonable” to mean “there are good arguments for not turning it on”. I am not sure whether I fully believe those arguments and I am confused about how to deal with them, but I do believe there are good arguments and I presented them to you because you asked for them. I didn’t enjoy spending a few hours defending an assertion I made that was tangential to my main point either.
Aside from the whole “ability to think critically about probabilities of existential risk will probably determine the fate of humankind and all other sapient species” thing, no, it doesn’t have any practical implications. But this is a thread about philosophy on a philosophy site, and you asked a philosophical question to a former philosophy student, so I don’t think it’s fair to expect me to anticipate that you wanted to avoid discussions that were purely philosophical.
Seriously, and minus the snark, it’s possible I don’t understand your objections. I promise I was not trying to troll you and I’m sorry if you feel like this has wasted your time.
1/3E22 seems hugely overconfident.
Voted up for excellent points all around.
I have in fact completely given up on giving probability estimates more extreme than +/-40 decibels, or 50-60 in some extreme (and borderline trivial) cases. I haven’t actually adjusted planning to compensate for the possible loss of fundamental assumptions, though, so I may be doing it wrong… On the gripping hand, however, most of the probability mass in the remaining options tends to be impossible to plan for anyway.
This is plausible and I shall contemplate it.
By the way, and a little bit on topic, I think it’s not a coincidence that an inverse casino would be more expensive to run than a regular casino.
But Komponisto’s idea is not to do with how many utils the universe contains.
Incidentally, it seems to me that if it’s possible to make a credible threat to destroy the universe, then our main problem is not Pascal’s mugging but the fragility of the universe.
As I understand it, komponisto’s idea is that we don’t have to worry about Pascal’s Mugging because the probability of anyone being able to control 3^^^^3 utils is even lower than one would expect simply looking at the number 3^^^^3, and is therefore low enough to cancel out even this large a number.
What I am trying to respond is that there are formulations of Pascal’s Mugging which do not depend on the number 3^^^^3. The idea that someone could destroy a universe worth of utils is more plausible than destroying 3^^^^3 utils, and it’s not at all obvious there that the low probability cancels out the high risk.
Well, it may not be obvious what to do in that case! But the original formulation of the Pascal’s Mugging problem, as I understand it, was to formally explain why it is obvious in the case of large numbers like 3^^^^3:
The answer proposed here is that a “friendly” utility function does not in fact allow utility to increase faster than complexity increases.
I don’t claim this tells us what to do about the LHC.
What Sewing-Machine said. A solution of the Pascal’s mugging problem certainly doesn’t imply that existential risks aren’t to be worried about!
This requirement (large numbers that refer to sets have large kolmogorov complexity) is a weaker version of my and RichardKenneway’s versions of the anti-mugging axiom. However, it doesn’t work for all utility functions; for example, Clippy would still be vulnerable to Pascal’s Mugging if using this strategy, since he doesn’t care whether the paperclips are distinct.
Hm, that solution seems like the one I gave (ironically, on a Clippy post), where I said that if you’re allowed to posit these huge utilities from complex (and thus improbable) hypotheses, you also have to consider hypotheses that are just as complex but give the opposite utility. But in the link I gave, people seemed to find something wrong with it: specifically, that the mugger gives an epsilon of evidence favoring the “you should pay”-supporting hypotheses, making them come out ahead.
So … what’s the deal?
Arranging your probability estimates so that predictions of opposite utility cancel out is one way to satisfy the anti-mugging axiom. It’s not the only way to do so, though; you can also require that the prior probabilities of statements (without corresponding opposite-utility statements) shrink at least as fast as utilities grow. There’s no rule that says that similar statements with positive and negative utilities have to have the same prior probabilities, unless you introduce it specifically for the purpose of anti-mugging defense.
My favored solution. Incidentally, if your prior shrinks faster, then you can still be vulnerable. The mugger can simply split his offer up into a billion smaller offers, which will avoid the penalty of big offers disproportionately being discounted. So unless you would reject every single mugging offer of any magnitude (in which case isn’t that kind of arbitrary?), the faster shrinking doesn’t buy you anything.
I believe a set of smaller offers would imply the existence of a statement which aggregates them and violates this formalization of the anti-mugging axiom.
On the other hand, you can potentially be forced to search the space of all functions for the one that diverges, and it might be possible (I don’t know whether it is) to mug in a way that makes finding that function computationally hard.
I take the aggregating thing as a constructive proof that that class of priors + utility function is vulnerable; your version just seems to put it another way. We agree on that part, I think.
I believe there is such a rule, which doesn’t have to be introduced ad hoc, and which follows from the tenets of algorithmic information theory. Per the reasoning I gave in the linked post, an arbitrary complex conclusion you locate (like the one in Pascal’s mugging) necessarily has a corresponding conclusion of equal complexity, but with the right predicate(s) inverted so that the inferred utility is reversed.
Because (by assumption) the conclusion is reached through arbitrary reasoning, disentangled from any real-world observation, you need no additional complexity for a hypothesis that critically inverts the first one. Since no other evidence supports either conclusion, their probability weights are determined by their complexity, and are thus equal.
That’s why I don’t think you need to introduce this reasoning as an additional axiom. However, as a separate matter (and whether or not you need it as an axiom), I thought this argument was refuted by the fact that the mugger, simply through assertion, introduces an arbitrarily small amount of evidence favoring one hypothesis over its inverse. If it refutes the defense I gave in the link, it should work against the anti-mugging axiom you’re using as well.
Thanks for the links; I seem to have missed that post.
There is an idea here, but it’s a little muddled. Why should complexity matter for Pascal’s mugging?
Well, the obvious answer to me is that, behind the scenes, you’re calculating an expected value, for which you need a probability of the antagonist actually following through. More complex claims are harder to carry out, so they have lower probability.
A separate issue is that of having bounded utility, which is possible, but it should be possible to do Pascal’s mugging even then, if the expected value of giving them money is higher than the expected value of not.
Anyhow, just “complexity” isn’t quite a way around Pascal’s mugging. It would be better to do a more complete assessment of the likelihood that the threat is carried out.
Among other things, the ability of the mugger to communicate the threat depends on the complexity of the threat.
This isn’t really the limiting reagent in the reaction, though. I can communicate all sorts of awful things (sorry, had to share—it’s totally my fault if you end up reading the entire thread) much more easily than I can do them.
Not for things with values in the range of 3^^^^3 -- in such a case the difference between ability-to-communicate and ability-to-carry-out is pretty much negligible. (The complexity of an action with 3^^^^3 units of disutility is right around 3^^^^3, under my proposal.)
Ah shoot, I read this post, and then I read SewingMachine’s post, and then I realized my reply to this post was wrong.
I’ll repeat my other comment. log(N) is an upper bound for the complexity of N, but complexity of N can be much smaller. Complexity of 3^^^3 is tiny compared to log(3^^^3).
Oh, you totally got ninja’d.
I think that the more general problem is that if the absolute value of the utility that you attach to a world-state increases faster than does its complexity decreases given the current situation then the very possibility of that world-state existing will cause it to hijack the entirety of your utility function (assuming that there are no other world-states in your utility function which go FOOM in a similar fashion.)
Of course, utility functions are not constructed to avoid this problem, so I think that it’s incredibly likely that each unbounded utility function has at least one world-state which would render it hijackable in such a manner.
Yes, that’s exactly the problem.
Well, they had better be, or they will fall victim to it.
You have to choose one of the following: (1) Pascal’s Mugging; (2) Scope Insensitivity (bounding utility by improbability); or (3) Wishful Thinking (bounding improbability by utility).
We often call such things a ‘problem’ yet by very definition it is exactly how it should be. If your utility function genuinely represents your preferences (including preferences with respect to risk) then rejoice in the opportunity to devote all your resources to the possibility in question! If it doesn’t then the only ‘problem’ is that your ‘utility function’, well, isn’t your actual utility function. It’s the same problem that you get when you think you like carrots when you really like peaches.
Voluntary dedication is not ‘hijacking’.
(Response primarily directed to quoted text and only a response to the parent in as much as it follows the problem frame.)
Agreed.
Our heuristics hijack our volition?
Don’t see how your idea defeats this:
Having a bounded utility function defeats that.
Without invoking complexity, one can say that an agent is immune to this form of Pascal’s mugging if, for fixed I, the quantity P(x amount of utility | I) goes to zero as x grows.
If the agent’s utility function is such that “x amount of utility” entails “f(x) amount of complexity,” f(x) --> infinity, then this will hold for priors that are sensitive to complexity.
See Sewing-Machine’s comment. The smallness of the probability isn’t fixed, if the probability is controlled by complexity, and complexity controls utility.
More precisely, the probability that the mugger can produce arbitrary amounts of utility is dominated by (the probability that the mugger can produce more than N units of utility), for every N; and as the latter is arbitrarily small for N sufficiently large, the former must be zero.
A corollary is a necessary condition for friendliness: if the utility function of an AI can take values much larger than the complexity of the input, then it is unfriendly. This kills Pascal’s mugging and paperclip maximizers with the same stone. It even sounds simple and formal enough to imagine testing it on a given piece of code.
How does that work at all? They’re not measured by the same unit (bits vs. utils), and you can multiply a utility function by a positive constant or add or subtract an arbitrary constant and still have it represent the same preferences.
The question and Weissman’s answer are good, so this is just a distraction: are utils and bits really thought of as units? The mathematical formalism of e.g. physics doesn’t actually have (or doesn’t require) units, but you can extract them by thinking about the symmetries of the theory: e.g. distance is measured in the same units vertically and horizontally because the laws of physics stay the same after changing some coordinates. How do people think about this in economics?
The concept can be rescued, at least from that objection, by saying instead that their should be some value alpha, such that for any description of a state of the universe, the utility of that state is less than alpha times the complexity of that description. That is, the asymptotic complexity of utility is linear in terms of complexity.
However, the utility function still isn’t up for grabs. If our actual true utility function violates this rule, I don’t want to say that an AGI is unfriendly for maximizing it.
Of course. The proposal here is that “our actual true utility function” does not violate this rule, since we are not in fact inclined to give in to a Pascalian mugger.
Sometimes, when I go back and read my own comments, I wonder just what goes on in that part of my brain that translates concepts into typed out words when I am not paying it conscious attention.
Anyways, let “our actual true utility function” refer to the utility function that best describes our collective values that we only manage to effectively achieve in certain environments that match the assumptions inherent in our heuristics. Thinking of it this way, one might wonder if Pascalian muggers fit into these environments, and if not, how much does our instinctual reaction to them indicate about our values?
I think I agree.
Perhaps one way to state the complexity-of-value thesis would be to say that the utility function should be bounded by Kolmogorov complexity.
It doesn’t quite kill Pascal’s mugging—the threat does have to have some minimum level of credibility, but that minimum credibility can still be low enough that hand over the cash. Pascal’s mugging only is killed if the expected utility of handing over the cash is negative. To show this I think you really do need to evaluate the probability to the end.
Neither does it kill paperclip maximizers. A bunch of paperclips requires about log2(N) bits to describe, plus the description of the properties of a paperclip. So the paperclip maximizer can still have a constantly-increasing utility as they make more paperclips, your rule would just bound it to growing like log(N).
Good line of thought though: there may still be something in here.
To “kill Pascal’s mugging” one doesn’t have to give advice on how to deal with threats generally.
I think that N paperclips takes about complexity-of-N, plus complexity of a paperclip, bits to describe. “Complexity of N” can be much lower than log(N), e.g. complexity of 3^^^3 is smaller than the wikipedia article on Knuth’s notation. “3^^^3 paperclips” has very low complexity and very high utility.
Ah, you’re right.
But I think that a decision theory is better (better fulfills desiterata of universality, simplicity, etc. etc.) if it treats Pascal’s mugging with the same method it uses for other threats.
Why? Is “threat” a particularly “natural” category?
From my perspective, Pascal’s mugging is simply an argument showing that a human-friendly utility function should have a certain property, not a special class of problem to be solved.
Hah. Well, we can apply my exact same argument with different words to show why I agree with you:
This will be the case in the scenario under discussion, due to the low probability of the mugger’s threat (in the “3^^^^3 disutilons” version), or the (relatively!) low disutility (in the “3^^^^3 persons” version, under Michael Vassar’s proposal).
Yes; it would be a “less pure” paperclip maximizer, but still an unfriendly AI.
The rule is (proposed to be) necessary for friendliness, not sufficient by any means.
I really like this suggestion. One esthetic thing it has going for it: complexity should be a terminal value for human-relatable intelligent agents anyway. It seems gauche for simple pleasures (orgasms, paperclips) to yield unbounded utility.
The problem, as stated, seems to me like it can be solved by precommitting not to negotiate with terrorists—this seems like a textbook case.
So switch it to Pascal’s Philanthropist, who says “I offer you a choice: either you may take this $5 bill in my hand, or I will use my magic powers outside the universe to grant you 3^^^^3 units of utility.”
But I’m actually not intuitively bothered by the thought of refusing the $5 in that case. It’s an eccentric thing to do, but it may be rational. Can anybody give me a formulation of the problem where taking the magic powers claim seriously is obviously crazy?
The two situations are not necessarily equivalent.
See my most recent response in the Pascal’s Mugging thread—taking into account the Mugger’s intentions & motives is relevant to the probability calculation.
Having said that, probably the two situations ARE equivalent—in both cases an increasingly high number indicates a higher probability that you are being manipulated.
That can work when the mugger is a terrorist. Unfortunately most muggers aren’t. They’re businessmen. Since the ‘threat’ issue isn’t intended to be the salient feature of the question we can perhaps specify that the mugger would be paid $3 to run the simulation and is just talking to you in a hope of getting a better offer. You do negotiate under those circumstances.
For my part I don’t like the specification of the problem as found on the wiki at all:
Quite aside from the ‘threat’ issue I just don’t care what some schmuck simulates on a Turing machine outside the matrix. That is a distraction.
No responses and a downvote. Clearly I’m missing something obvious.
I wasn’t the downvoter (nor the upvoter), and wouldn’t have downvoted; but I would suggest considering the abstract version of the problem:
I stumbled across this fix and unfortunately discovered what I consider to be a massive problem with it—it would imply that your utility function is non-computable.
OK. So in order for this to work, it needs to be the case that your prior has the property that: P(3^^^3 disutility | I fail to give him $5) << 1/3^^^3.
Unfortunately, if we have an honest Kolmogorov prior and utility is computable via a complexity << 3^^^3 Turing machine, this cannot possibly be the case. In particular, it is a Theorem that for any computable function C (whose Turing machine has complexity K(C)), so that there are x with C(x) > N, then under the Kolmogorov prior for x we have that: P( C(x) > N ) >> 2^{ - K(C) - K(N) } Now, since K(3^^^3) is small, as long as utility is computed by a small Turing machine, and it is possible to have 3^^^3 disutility, such a circumstance will not be too unlikely under a Kolmogorov prior.
For those interested, here’s how the theorem is proved. I will produce a Turing machine of size K(C) + K(N) +O(1) that outputs an x (in fact, the smallest x) so that C(x) > N. By definition, I can encode C and N in size K(C) + K(N) +O(1). I then have a Turing machine enumerate all x until it finds one so that C(x) > N and output’s that x. This provides a lower bound.
I guess the problem is that if you just have a Kolmogorov prior, there is a relatively simple universe that is actually out to get you. In fact, being the shortest computation causing 3^^^3 disutility is actually a pretty simple condition.
The way around Pascal’s mugging is to have a bounded utility function. Even if you are a paperclip-maximizer, your utility function is not the number of paperclips in the universe, it is some bounded function that is monotonic in the number of paperclips but asymptotes out. You are only linear in paperclips over small numbers of paperclips. This is not due to exponential discounting but because utility doesn’t mean anything other than the function that we are maximizing the expected value of. It has an unfortunate namespace collision with the other utility, which is some intuitive quantification of our preferences that is probably closer to something like a description of the trades we would be willing to make. If you are unwilling to be mugged by Pascal’s mugger then it simply follows as a mathematical fact that your utility is bounded by something on the order of the reciprocal of the probability that you would be un-muggable at.
For more of a description, see my post here, which originally got downvoted to oblivion because it argued from the position of a lack of knowledge of the VNM utility theorem. The post has since been fixed, and while it is not super-detailed, lays out an argument for why Pascal’s mugging is resolved once we stop trying to make our utility functions look intuitive.
Incidentally, Pascal’s mugging does lay out a good argument of why we need to be careful about an AGI’s utility function; if we make it unbounded then we can get weird behavior indeed.
EDIT: Of course, perhaps I am still wrong somehow and there are unresolvable subtleties that I am missing. But I, at least, am simply unwilling to care about events occurring with probability 10^(-100), regardless of how bad they are.
Way around? If my utility function suggests that being mugged by Pascal is the best thing for me to do then I’ll be delighted to do it.
Utility functions determine our decisions, not the reverse!
A utility function shouldn’t suggest anything. It is simply an abstract mathematical function that is guaranteed to exist by the VNM utility theorem. If you’re letting an unintuitive mathematical theorem tell you to do things that you don’t want to do, then something is wrong.
Again, the problem is there is a namespace collision between the utility function guaranteed by VNM, which we are maximizing the expected value of, and the utility function that we intuitively associate with our preferences, which we (probably) aren’t maximizing the expected value of. VNM just says that if you have consistent preferences, then there is some function whose expected value you are maximizing. It doesn’t say that this function has anything to do with the degree to which you want various things to happen.
I seem to be having a lot of trouble getting this point across, so let me try to put it another way: Ignore Kolmogorov complexity, priors, etc. for a moment, and if you can, forget about your utility function and just ask yourself what you would want. Now imagine the worst possible thing that could happen (you can even suppose that both time and space are potentially infinite, so infinitely many people being tortured for infinite extents of time is fine). Let us call this thing X. Suppose that you have somehow calculated that, with probability 10^(-100), the mugger will cause X to happen if you don’t pay him $5. Would you pay him? If you would pay him, then why?
I am actually quite interested in the answer to this question, because I am having trouble diagnosing the precise source of my disagreement on this issue. And even though I said to forget about utility functions, if you really think that is the answer to the “why” question, feel free to use them in your argument. As I said, at this point I am most interested in determining why we disagree, because previous discussions with other people suggest that there is some hidden inferential distance afoot.
As an aside, if you wouldn’t pay him then the definition of utility implies that u($5) > 10^(-100) u(X), which implies that u(X), and therefore the entire utility function, is bounded.
As was pointed out in the other subthread, you are assuming the conclusion you wish to prove here, viz. that the utility function is (necessarily) bounded.
Fine, I was slightly sloppy in my original proof (not only in the way you pointed out, but also in keeping track of signs). Here is a rigorous version:
Suppose that there is nothing so bad that you would pay $5 to stop it from happening with probability 10^(-100). Let X be a state of the universe. Then u(-$5) < 10^(-100) u(X), so u(X) > 10^(100) u(-$5). Since u(X) > 10^(100) u(-$5) for all X, u is bounded below.
Similarly, suppose that there is nothing so good that you would pay $5 to have a 10^(-100) chance of it happening. Then u($5) > 10^(100) u(X) for all X, so u(X) < 10^(100) u($5), hence u is also bounded above.
Now I’ve given proofs that u is bounded both above and below, without looking at argmax u or argmin u (which incidentally probably don’t exist even if u is bounded; it is much more likely that u asymptotes out).
My proof is still not entirely rigorous, for instance u(-$5) and u($5) will in general depend on my current level of income / savings. If you really want me to, I can write everything out completely rigorously, but I’ve been trying to avoid it because I find that diving into unnecessary levels of rigor only obscures the underlying intuition (and I say this as someone who studies math).
Again, why assume this?
Your question has two possible meanings to me, so I’ll try to answer both.
Meaning 1: Why is this a reasonable assumption in the context of the current debate?
Answer: Because if there was something that bad, then you get Pascal’s mugged in my hypothetical situation. What I have shown is that either you would give Pascal $5 in that scenario, or your utility function is bounded.
Meaning 2: Why is this a reasonable assumption in general?
Answer: Because things that occur with probability 10^(-100) don’t actually happen. Actually, 10^(-100) might be a bit high, but certainly things that occur with probability 10^(-10^(100)) don’t actually happen.
You seem not to have understood the post. The worse something is, the more difficult it is for the mugger to make the threat credible. There may be things that are so bad that I (or my hypothetical AI) would pay $5 not to raise their probability to 10^(-100), but such things have prior probabilities that are lower than 10^(-100), and a mugger uttering the threat will not be sufficient evidence to raise the probability to 10^(-100).
We don’t need to declare 10^(-100) equal to 0. 10^(-100) is small enough already.
I have to admit that I did find the original post somewhat confusing. However, let me try to make sure that I understood it. I would summarize your idea as saying that we should have u(X) = O(1/p(X)), where u is the utility function and p is our posterior estimate of X. Is that correct? Or do you want p to be the prior estimate? Or am I completely wrong?
Yes, p should be the prior estimate. The point being that the posterior estimate is not too different from the prior estimate in the “typical” mugging scenario (i.e. someone says “give me $5 or I’ll create 3^^^^3 units of disutility” without specifying how in enough detail).
So, backing up, let me put forth my biggest objections to your idea, as I see it. I will try to stick to only arguing about this point until we can reach a consensus.
I do not believe there is anything so bad that you would trade $5 to prevent it from happening with probability 10^(-500). If there is, please let me know. If not, then this is a statement that is independent of your original priors, and which implies (as noted before) that your utility function is bounded.
I concede that the condition u(X) = O(1/p(X)) implies that one would be immune to the classical version of the Pascal’s mugging problem. What I am trying to say now is that it fails to be immune to other variants of Pascal’s mugging that would still be undesirable. While a good decision theory should certainly be immune to [the classical] Pascal’s mugging, a failure to be immune to other mugging variants still raises issues.
My claim (which I supported with math above) is that the only way to be immune to all variants of Pascal’s mugging is to have a bounded utility function.
My stronger claim, in case you agree with all of the above but think it is irrelevant, is that all humans have a bounded utility function. But let’s avoid arguing about this point until we’ve resolved all of the issues in the preceding paragraphs.
I’m a little suspicious talking about “the utility function” of a human being. We are messy biological creatures whose behavior is determined, most directly, by electrochemical stuff and not economic stuff. Our preferences are not consistent from minute-to-minute, and there is a lot of inconsistency between our stated and revealed preferences. We are very bad at computing probabilities. And so on. It’s better to speak of a given utility function approximating the preferences of a given human being. I think we can (we have to) leave this notion vague and still make progress.
I think that this is plausible. In the vaguer language of 0., we could wonder if “any utility function that approximates the preferences of a human being is bounded.” The partner of this claim, that events with probability 10^(-500) can’t happen, is also plausible. For instance, they would both follow from any kind of ultrafinitism. But however plausible we find it, none of us yet know whether it’s the case, so it’s valuable to consider alternatives.
Write X for a terrible thing (if you prefer the philanthropy version, wonderful thing) that has probability 10^(-500). To pay 5$ to prevent X means by revealed preference that |U(X)| > 5*10^(500). Part of Komponisto’s proposal is that, for a certain kind of utility function, this would imply that X is very complicated—too complicated for him to write down. So he couldn’t prove to you (not in this medium!) that so-and-so’s utility function can take values this high by describing an example of something that terrible. It doesn’t follow that U(X) is always small—especially not if we remain agnostic about ultrafinitism.
Okay, thanks. So it is the prior, not the posterior, which makes more sense (as the posterior will be in general changing while the utility function remains constant).
My objection to this is that, even though you do deal with the “typical” mugging scenario, you run into issues in other scenarios. For instance, suppose that your prior for X is 10^(-1000), and your utility for X is 10^750, which I believe fits your requirements. Now suppose that I manage to argue your posterior up to 10^(-500). Either you can get mugged (for huge amounts of money) in this circumstance, or your utility on X is actually smaller than 10^(500).
Getting “mugged” in such a scenario doesn’t seem particularly objectionable when you consider the amount of work involved in raising the probability by a factor of 10^(500).
It would be money well earned, it seems to me.
I don’t see how this is relevant. It doesn’t change the fact that you wouldn’t actually be willing [I don’t think?] to make such a trade.
The mugger also doesn’t have to do all the work of raising your probability by a factor of 10^(500), the universe can do most (or all) of it. Remember, your priors are fixed once and for all at the beginning of time.
In the grand scheme of things, 10^(500) isn’t all that much. It’s just 1661 bits.
Why shouldn’t I be? A 10^(-500) chance of utility 10^(750) yields an expected utility of 10^(250). This sounds like a pretty good deal to me, especially when you consider that “expected utility” is the technical term for “how good the deal is”.
(I’ll note at this point that we’re no longer discussing Pascal’s mugging, which is a problem in epistemology, about how we know the probability of the mugger’s threat is so low; instead, we’re discussing ordinary expected utility maximization.)
You postulated that my prior was 10^(-1000), and that the mugger raised it to 10^(-500). If other forces in the universe cooperated with the mugger to accomplish this, I don’t see how that changes the decision problem.
In which case, we can also say that a posterior probability of 10^(-500) is “just” 1661 bits away from even odds.
I know what the definition of utility is. My claim is that there does not exist any event such that you would care about it happening with probability 10^(-500) enough to pay $5.
You said that you would be okay with losing $5 to a mugger who raised your posterior by a factor of 10^(500), because they would have to do a lot of work to do so. I responded by pointing out that they wouldn’t have to do much work after all. If this doesn’t change the decision problem (which I agree with) then I don’t see how your original reasoning that it’s okay to get mugged because the mugger would have to work hard to mug you makes any sense.
At the very least, I consider making contradictory [and in the first case, rather flippant] responses to my comments to be somewhat logically rude, although I understand that you are the OP on this thread, and thus have to reply to many people’s comments and might not remember what you’ve said to me.
I believe that this entire back-and-forth is derailing the discussion, so I’m going to back up a few levels and try to start over.
Granted.
What determines how much I am willing to pay is not how hard the mugger works per se, but how credible the threat is compared to its severity. (I thought this went without saying, and that you would be able to automatically generalize from “the mugger working hard” to “the mugger’s credibility increasing by whatever means”.) Going from p = 10^(-1000) to p = 10^(-500) may not sound like a “huge” increase in credibility, but it is. Or at least, if you insist that it isn’t, then you also have to concede that going from p = 10^(-500) to p = 1⁄2 isn’t that big of a credibility increase either, because it’s the same number of bits. In fact, measured in bits, going from p = 10^(-1000) to p = 10^(-500) is one-third of the way to p = 1-10^(-500) !
Now I presume you understand this arithmetic, so I agree that this is a distraction. In the same way, I think the simple mathematical arguments that you have been presenting are also a distraction. The real issue is that you apparently don’t believe that there exist outcomes with utilities in the range of 10^(750). Well, I am undecided on that question, because at this point I don’t know what “my” values look like in the limit of superintelligent extrapolation on galactic scales. (I like to think I’m pretty good at introspection, but I’m not that good!) But there’s no way I’m going to be convinced that my utility function has necessarily to be bounded without some serious argument going significantly beyond the fact that the consequences of an unbounded utility function seem counterintuitive to another human whose style of thought has already been demonstrated to be different from my own.
If you’ve got serious, novel arguments to offer for why a human-extracted utility function must be bounded, I’m quite willing to consider them, of course. But as of now I don’t have much evidence that you do have such arguments, because as far as I can tell, all you’ve said so far is “I can’t imagine anything with such high utility!”
Fair enough.
P.S. Given that we’ve apparently had protracted disagreements on two issues so far, I just wanted you to know that I’m not trying to troll you or anything (in fact, I hadn’t realized that you were the same person who had made the Amanda Knox post). I will try to keep in mind in the future that our thinking styles are different and that appeals to intuition will probably just result in frustration.
This doesn’t actually imply that the entire utility function is bounded. It is still possible that u(Y) is infinite, where Y is something that is valued positively.
As an aside we can now consider the possibility of Pascal’s Samaritan.
Assume a utility function such that u(Y) is infinite (and neutral with respect to risk). Further assume that you predict that $5 would increase your chance of achieving Y by 1/3^^^3. A Pascal Samaritan can offer to pay you $5 for the opportunity to give you a 90% chance of sending the entire universe into the hell state X. Do you take the $5?
From my reply to komponisto (incidentally, both you and he seem to be making the same objections in parallel, which suggests that I’m not doing a very good job of explaining myself, sorry):
The meaning of a phrase, primarily. And slightly about the proper use of an abstract concept.
A utility function should be a representation of my values. If my values are such that paying a mugger is the best option then I am glad to pay a mugger.
If I were to pay him it would be because I happen to value not having a 10^(-100) chance of X happening more than I value $5.
My utility function quite likely is bounded. Not because that is a way around pascal’s mugging. Simply because that happens to be what the arbitrary value system represented by this particular bunch of atoms happens to be.
Hm...it sounds like we agree on far more than I thought, then.
What I am saying is that my utility function is bounded because it would be ridiculous to be Pascal’s mugged, even in the hypothetical universe I created that disobeys komponisto’s priors. Put another way, I am simply not willing to seriously consider events at probabilities of, say, 10^(-10^(100)), because such events don’t happen. For this same reason, I have a hard time taking anyone seriously who claims to have an unbounded utility function, because they would then care about events that can’t happen in a sense at least as strong as the sense that 1 is not equal to 2.
Would you object to anything in the above paragraph? Thanks for bearing with me on this, by the way.
P.S. Am I the only one who is always tempted to write “mugged by Pascal” before realizing that this is comically different from being “Pascal’s mugged”?
As far as I know they do happen. To know that such a number cannot represent an altogether esoteric feature of the universe that can nevertheless be the legitimate subject of infinite value I would need to know the smallest number that can be assigned to a quantum state.
(This objection is purely tangential. See below for significant disagreement.)
That isn’t true. Someone can assign infinite utility to Australia winning the ashes if that is what they really want. I’d think them rather silly but that is just my subjective evaluation, nothing to do with maths.
I think you are conflating quantum probabilities with Bayesian probabilities here, but I’m not sure. Unless you think this point is worth discussing further I’ll move on to your more significant disagreement.
Hm...I initially wrote a two-paragraph explanation of why you were wrong, then deleted it because I changed my mind. So, I think we are making progress!
I initially thought I accorded disdain to unbounded utility functions for the same reason that I accorded disdain to ridiculous priors. But the difference is that your priors affect your epistemic state, and in the case of beliefs there is only one right answer. On the other hand, there is nothing inherently wrong with being a paperclip maximizer.
I think the actual issue I’m having is that I suspect that most people who claim to have unbounded utility functions would have been unwilling to make the trades implied by this before reading about VNM utility / “Shut up and multiply”. So my objection is not that unbounded utility functions are inherently wrong, but that they cannot possibly reflect the preferences of a human.
On this I believe we approximately agree.
The post you’re commenting on argues that Pascal’s mugging is already solved by merely letting the utility function be bounded by Kolmogorov complexity. Obviously, having it be uniformly bounded also solves the problem, but why resort to something so drastic if you don’t need to?
The OP is not living in the least convenient possible world. In particular, let X be the worst thing that could happen. Suppose that at the end of the day you have calculated that X will occur with probability 10^(-100) if you don’t pay the mugger $5. Assuming that you wouldn’t pay the mugger, then by definition of the utility function it follows that u($5) > 10^(-100) u(X). So u(X) < 10^(100) u($5) and is therefore bounded. Since u(X) is the worst thing that could happen, this means that your entire utility function is bounded.
See also my reply to wedrifid where this argument is slightly expanded.
If your utility function is not bounded (below), then there is no “worst thing that could happen.”
See my reply to komponisto in the comment above.
When denizens here say “value is complex” what they mean is something like “the things which humans want have no concise expression”. They don’t literally mean that a utility counter measuring the extent to which those values are met is difficult to compress. That would not make any sense.
I don’t understand what you mean. Say more?