Seeking Estimates for P(Hell)
I am trying to decide how to allocate my charitable donations between GiveWell’s top charities and MIRI, and I need a probability estimate to make an informed decision. Could you help me?
Background on my moral system: I place a greater value on reducing high doses of suffering of conscious entities than merely preventing death. An unexpected, instant, painless death is unfortunate, but I would prefer it to a painful and chronic condition.
Given my beliefs, it follows logically that I would pay a relatively large amount to save a conscious entity from prolonged torture.
The possibility of an AI torturing many conscious entities has been mentioned1 on this site, and I assume that funding MIRI will help reduce its probability. But what is its current probability?
Obviously a difficult question, but it seems to me that I need an estimate and there is no way around it. I don’t even know where to start...suggestions?
1 http://lesswrong.com/lw/1pz/the_ai_in_a_box_boxes_you/
Short answer:
Donate to MIRI, or split between MIRI and GiveWell charities if you want some fuzzies for short-term helping.
Long answer:
I’m a negative utilitarian (NU) and have been thinking since 2007 about the sign of MIRI for NUs. (Here’s some relevant discussion.) I give ~70% chance that MIRI’s impact is net good by NU lights and ~30% that it’s net bad, but given MIRI’s high impact, the expected value of MIRI is still very positive.
As far as your question: I’d put the probability of uncontrolled AI creating hells higher than 1 in 10,000 and the probability that MIRI as a whole prevents that from happening higher than 1 in 10,000,000. Say such hells used 10^-15 of the AI’s total computing resources. Assuming computing power to create ~10^30 humans for ~10^10 years, MIRI would prevent in expectation ~10^18 hell-years. Assuming MIRI’s total budget ever is $1 billion (too high), that’s ~10^9 hell-years prevented per dollar. Now apply rigorous discounts to account for priors against astronomical impacts and various other far-future-dampening effects. MIRI still seems very promising at the end of the calculation.
Okay. I’m sure you’ve seen this question before, but I’m going to ask it anyway.
Given a choice between
A world with seven billion mildly happy people, or
A world with seven billion minus one really happy people, and one person who just got a papercut
Are you really going to choose the former? What’s your reasoning?
From a practical perspective, accepting the papercut is the obvious choice because it’s good to be nice to other value systems.
Even if I’m only considering my own values, I give some intrinsic weight to what other people care about. (“NU” is just an approximation of my intrinsic values.) So I’d still accept the papercut.
I also don’t really care about mild suffering—mostly just torture-level suffering. If it were 7 billion really happy people plus 1 person tortured, that would be a much harder dilemma.
In practice, the ratio of expected heaven to expected hell in the future is much smaller than 7 billion to 1, so even if someone is just a “negative-leaning utilitarian” who cares orders of magnitude more about suffering than happiness, s/he’ll tend to act like a pure NU on any actual policy question.
The second option is a world with seven billion −1 really happy people and one person who is a tiny bit less than mildly happy?
My reason to choose the former would be that all of those lives are experienced by only one person and everyone experiences only one life. In the former case, no subjective experience is worse than mildly happy. In the latter case, a subjective experience is worse than that. It doesn’t matter how much happiness or pain a number of people will cumulatively experience because no one actually experiences the cumulative experience. All that matters is improving the worst life at any given moment.
I won’t be surprised if my reasoning is bullshit, but I’m not seeing it.
The problem I see here is that if you literally care only about the “worst life at any given moment”, then situations “seven billion extremely happy people, one mildly unhappy person” and “seven billion mildly hapy people, one mildly unhappy person” are equivalent, because the worst one is in the same situation. Which means, if you had a magical button that could convert the latter situation to the former, you wouldn’t bother pressing it, because you wouldn’t see a point in doing so. Is that what you really believe?
I care about wellbeing, but only second to pain. I’d definitely press a button maximizing happiness if it didn’t cause individual unhappiness worse than it cured. Doesn’t that make sense?
On second thought, two equally happy people > one and likewise with unhappiness. Maybe it doesn’t make sense after all. Or it’s a mix of a moral guideline (NU) and personal preference?
Good point. Also, in most multiverse theories, the worst possible experience necessarily exists somewhere.
And this is why destroying everything in existence doesn’t seem obviously evil (not that I’d act on it...)
That would also be futile, because somewhere in the multiverse your plans to destroy everything would fail.
A torturing AI is most likely to happen as a result of deliberate human actions because in many types of negotiations you want to threaten your opponents with the worst possible punishment, for example, convert your population to my religion or I will subject your population to eternal torture by my AIs.
I think the scenario of an AI torturing humans in the future is very, very unlikely. For most possible goals an AI could have, it will have ways to accomplish them that are more effective than torturing humans.
The chance of an AI torturing humans as a means to some other goal does seem low, but what about the AI torturing humans as a end in itself? I think CEV could result in this with non-negligible probability (>0.000001). I wouldn’t be surprised if the typical LessWrong poster has very different morality than the majority of the population, so our intuition of the results of CEV could be very wrong.
Note that it does not suffice for us to have different conscious morality or different verbal statements of values. That only matters if the difference remains under extrapolation, eg, of what others would want if they knew there weren’t any deities.
While others will probably answer your question as-is, I’d just like to point out that for most people who care about AI and who support MIRI, this is not the line of reasoning that convinced them nor is it the best reason to care. FAI is important because it would fix most of the world’s problems and ensure us all very long, fulfilling lives, and because without it, we’d probably fail to capture the stars’ potential and wither to death of old age.
Torture mostly comes up because philosophical thought-experiments tend to need a shorthand for “very bad thing not otherwise specified”, and it’s an instance of that which won’t interact with other parts of the thought experiments or invite digressions.
If you believe my moral system (not the topic of this post) is patently absurd, please PM me the full version of your argument. I promise to review it with an open mind. Note: I am naturally afraid of torture outcomes, but that doesn’t mean I’m not excited about FAI. That would be patently absurd.
To clarify: are you saying there is no chance of torture?
Yes, I am saying that the scenario you allude to is vanishingly unlikely.
But there’s another point, which cuts close to the core of my values, and I suspect it cuts close to the core of your values, too. Rather than explain it myself, I’m going to suggest reading Scott Alexander’s Who By Very Slow Decay, which is about aging.
That’s the status quo. That’s one of the main the reasons I, personally, care about AI: because if it’s done right, then the thing Scott describes won’t be a part of the world anymore.
Good piece, thank you for sharing it.
I agree with you and Scott Alexander—painful death from aging is awful.
I second this. Mac, I suggest you read “Existential Risk Prevention as a Global Priority” if you haven’t already to further understand why an AI killing all life (even painlessly) would be extremely harmful.
Note that you are talking about a very low propability of a potentially very high impact scenario. This is going to push the envelope on your system of determining atttitudes towards uncertain events.
One would assume that human-to-human torture is low-enough risk with low-enough impact that it doesn’t trigger money usage concerns. Then you could estimate how much more unlikely a malicious AI would be and how much more badder that could be in comparison. It could be argued that the badness raises faster than the unlikelihood lowers making it more relevant. However you could try to quantify that by some Fermi estimation. If one intelligence could have a total misery impact of a nazi concentration camp (a highish torture unit) what would need to be the per person chance of those kinds of people existing that you would cough up cash? You can then try to raise the unit of terribelness. If every human could choose to trigger global thermonuclear war (which I think only top brass of a couple of countries are able to do easily currently) what rate of moral fallibility would be unacceptable? I guess it also the idea that if you are asking what the probability is in order to make a spread to match that you can first fix a spread and then ask what the probability would need to be that this spread would still be warranted. Such as “Will the probability be high enough that giving MIRI $1 is warrranted?” This is a yes or no question and the answer is likely to be a lower bound on the needed probability*impact or whatever the relevant thing to say about the future.
Do note that because you are applying a very rough description of your ethics in a very extreme situation it would be good to be confident that the description really applies. For example because you prefer non-suffering over life it means you ought to also prefer death over suffering. The implication of this is that killing off people that are miserable improves a situation ie that euthanasia should be exercised whereever applicaple. It also means that maximising whatever is the opposite of suffering should be more important than life extension. This will lead into some form of “short rock star life” over “long grey life”. It differs a little depending on whether it’s virtousness, pleasure or goal attainment. If you are unsymmteric about torture and anti-torture then you need to decide what is your attitudes against futures of roughly equal expected amonut of torture and anti-torture and a dull nothing happens future (no torture or anti-torture) ie does chance of heaven mitigate against chance of hell? Then there is also the issue on whether you care about the depth of the torture ie how many circles in the hell there are. If you could half the chance of hell but make it twice as worse if it happens is this a neutral move or an improvement? Given that you antivalue torture even more than death it stands also to argue whether there is a maximum amount of anti-value to be created. You can’t be deader than dead but it might be possible to make really heinous forms of torture and spesifically boost the conciousness level of the tortured one to make them especially aware of the horribelness of it all (ie is there a limit to antisedation). There might also be the possibility of antiwireheading (as in the torture doesn’t involve about doing bad but redefining pain signals so that everything the tortured experiences is interpreted as pain, ie instead of doing something to their bodies that they hate make them hate their bodies (or the fact that they are human)).
I do have to note that because the situation is about extreme conditions there are a couple of differences to a typical assement situaiton. In a normal assement we can imagine pretty much on all the possibilities but we lack information on which of them migth be the case. However in this scenario we are not imaginarily complete and might not reach that state by the end of the analysis. That is there is a significant portion of mechanisms that we fail to imagine being relevant. This also leads in a need to “imagine in a fair sampling way”. LIke with machines that search only finite proof lengths the order and expression length of proposition migth alter the outcome, this kind of situation might be sensitive on what parts of it you understand. If you wanted to reach a yes or no outcome (motivated cognition) you could explore only the kinds of ways that might happen. in order to avoid that you would need to know that the “imaginative attention” ratio reflects the ratio of probabilities. However an AI scenario involves the utilisation of yet undiscovered techs. The probabilty effecting properties of those techs can’t be summed neatly without knowing their details. In the same way it’s hard to evaluate base research vs applicative reserach on economic terms it’s hard to evaluate how ethics vs political stability guarantees that world will stay a place that everybody likes. Usually when one attempts reserach one typically doesn’t produce anti-research (ie making experiements doesn’t burn libraries) but messing with ethics has a real chance to produce monsters.
So you migth not actually be looking for information to make an informed decision but understanding to make a concious decision. In this area simply not a lot of information is available.
It is clearly a difficult problem to estabilish correctly. I think though it could be side-stepped if you happen to found better defined scenarios that dominate “AI torturing people in simulations”. The way to go depends on wether you happen to care less about the future than the present or not.
For example, in the first case, there are people that are being tortured right now by other real people, so you might want to concentrate on that.
There’re other calculations to consider too (edit: and they almost certainly outweigh the torture possibilities)! For instance:
Suppose that if you can give one year of life this year by giving $25 to AMF (Givewell says $3340 to save a child’s life, not counting the other benefits).
If all MIRI does is delay the development of any type of Unfriendly AI, your $25 would need to let MIRI delay that by, ah, 4.3 milliseconds (139 picoyears). With 10% a year exponential future discounting and 100 years before you expect Unfriendly AI to be created if you don’t help MIRI and no population growth, that $25 now needs to give them enough resources to delay UFAI about 31 seconds.
This is true for any project that reduces humanity’s existential risk. AI is just the saddest if it goes wrong, because then it goes wrong for everything in, slightly less than, our light cone.