Many thanks for posting that link. It’s clearly the most important thing I’ve read on LW in a long time, I’d upvote it ten times if I could.
It seems like an s-risk outcome (even one that keeps some people happy) could be more than a million times worse than an x-risk outcome, while not being a million times more improbable, so focusing on s-risks is correct. The argument wasn’t as clear to me before. Does anyone have good counterarguments? Why shouldn’t we all focus on s-risk from now on?
(Unsong had a plot point where Peter Singer declared that the most important task for effective altruists was to destroy Hell. Big props to Scott for seeing it before the rest of us.)
I don’t buy the “million times worse,” at least not if we talk about the relevant E(s-risk moral value) / E(x-risk moral value) rather than the irrelevant E(s-risk moral value / x-risk moral value). See this post by Carl and this post by Brian. I think that responsible use of moral uncertainty will tend to push you away from this kind of fanatical view
I agree that if you are million-to-1 then you should be predominantly concerned with s-risk, I think they are somewhat improbable/intractable but not that improbable+intractable. I’d guess the probability is ~100x lower, and the available object-level interventions are perhaps 10x less effective. The particular scenarios discussed here seem unlikely to lead to optimized suffering, only “conflict” and ”???” really make any sense to me. Even on the negative utilitarian view, it seems like you shouldn’t care about anything other than optimized suffering.
The best object-level intervention I can think of is reducing our civilization’s expected vulnerability to extortion, which seems poorly-leveraged relative to alignment because it is much less time-sensitive (unless we fail at alignment and so end up committing to a particular and probably mistaken decision-theoretic perspective). From the perspective of s-riskers, it’s possible that spreading strong emotional commitments to extortion-resistance (e.g. along the lines of UDT or this heuristic) looks somewhat better than spreading concern for suffering.
The meta-level intervention of “think about s-risk and understand it better / look for new interventions” seems much more attractive than any object-level interventions we yet know, and probably worth investing some resources in even if you take a more normal suffering vs. pleasure tradeoff. If this is the best intervention and is much more likely to be implemented by people who endorse suffering-focused ethical views, it may be the strongest incentive to spread suffering-focused views. I think that higher adoption of suffering-focused views is relatively bad for people with a more traditional suffering vs. pleasure tradeoff, so this is something I’d like to avoid (especially given that suffering-focused ethics seems to somehow be connected with distrust of philosophical deliberation). Ironically, that gives some extra reason for conventional EAs to think about s-risk, so that the suffering-focused EAs have less incentive to focus on value-spreading. This also seems like an attractive compromise more broadly: we all spend a bit of time thinking about s-risk reduction and taking the low-hanging fruit, and suffering-focused EAs do less stuff that tends to lead to the destruction of the world. (Though here the non-s-riskers should also err on the side of extortion-resistance, e.g. trading with the position of rational non-extorting s-riskers rather than whatever views/plans the s-riskers happen to have.)
An obvious first question is whether the existence of suffering-hating civilizations on balance increases s-risk (mostly by introducing game-theoretic incentives) or decreases s-risk (by exerting their influence to prevent suffering, esp. via acausal trade). If the latter, then x-risk and s-risk reduction may end up being aligned. If the former, then at best the s-riskers are indifferent to survival and need to resort to more speculative interventions. Interestingly, in this case it may also be counterproductive for s-riskers to expand their influence or acquire resources. My guess is that mature suffering-hating civilizations reduce s-risk, since immature suffering-hating civilizations probably provide a significant part of the game-theoretic incentive yet have almost no influence, and sane suffering-hating civilizations will provide minimal additional incentives to create suffering. But I haven’t thought about this issue very much.
Carl’s post sounded weird to me, because large amounts of human utility (more than just pleasure) seem harder to achieve than large amounts of human disutility (for which pain is enough). You could say that some possible minds are easier to please, but human utility doesn’t necessarily value such minds enough to counterbalance s-risk.
Brian’s post focuses more on possible suffering of insects or quarks. I don’t feel quite as morally uncertain about large amounts of human suffering, do you?
As to possible interventions, you have clearly thought about this for longer than me, so I’ll need time to sort things out. This is quite a shock.
large amounts of human utility (more than just pleasure) seem harder to achieve than large amounts of human disutility (for which pain is enough).
Carl gave a reason that future creatures, including potentially very human-like minds, might diverge from current humans in a way that makes hedonium much more efficient. If you assigned significant probability to that kind of scenario, it would quickly undermine your million-to-one ratio. Brian’s post briefly explains why you shouldn’t argue “If there is a 50% chance that x-risks are 2 million times worse, than they are a million times worse in expectation.” (I’d guess that there is a good chance, say > 25%, that good stuff can be as efficient as bad stuff.)
I would further say: existing creatures often prefer to keep living even given the possibility of extreme pain. This can be easily explained by an evolutionary story, which suffering-focused utilitarians tend to view as a debunking explanation: given that animals would prefer keep living regardless of the actual balance of pleasure and pain, we shouldn’t infer anything from that preference. But our strong dispreference for intense suffering has a similar evolutionary origin, and is no more reflective of underlying moral facts than is our strong preference for survival.
and suffering-focused EAs do less stuff that tends to lead to the destruction of the world.
In support of this, my system 1 reports that if it sees more intelligent people taking S-risk seriously it is less likely to nuke the planet if it gets the chance. (I’m not sure I endorse nuking the planet, just reporting emotional reaction).
especially given that suffering-focused ethics seems to somehow be connected with distrust of philosophical deliberation
Can you elaborate on what you mean by this? People like Brian or others at FRI don’t seem particularly averse to philosophical deliberation to me...
This also seems like an attractive compromise more broadly: we all spend a bit of time thinking about s-risk reduction and taking the low-hanging fruit, and suffering-focused EAs do less stuff that tends to lead to the destruction of the world.
I support this compromise and agree not to destroy the world. :-)
Those of us who sympathize with suffering-focused ethics have an incentive to encourage others to think about their values now, at least in crudely enough terms to take a stance on prioritizing preventing s-risks vs. making sure we get to a position where everyone can safely deliberate their values further and then everything gets fulfilled. Conversely, if one (normatively!) thinks the downsides of bad futures are unlikely to be much worse than the upsides of good futures, then one is incentivized to promote caution about taking confident stances on anything population-ethics-related, and instead value deeper philosophical reflection. The latter also has the upside of being good from a cooperation point of view: Everyone can work on the same priority (building safe AI that helps with philosophical reflection) regardless of one’s inklings about how personal value extrapolation is likely to turn out.
(The situation becomes more interesting/complicated for suffering-focused altruists once we add considerations of multiverse-wide compromise via coordinated decision-making, which, in extreme versions at least, would call for being “updateless” about the direction of one’s own values.)
Can you elaborate on what you mean by this? People like Brian or others at FRI don’t seem particularly averse to philosophical deliberation to me...
People vary in what kinds of values change they would consider drift vs. endorsed deliberation. Brian has in the past publicly come down unusually far on the side of “change = drift,” I’ve encountered similar views on one other occasion from this crowd, and I had heard second hand that this was relatively common.
Brian or someone more familiar with his views could speak more authoritatively to that aspect of the question, and I might be mistaken about the views of the suffering-focused utilitarians more broadly.
An obvious first question is whether the existence of suffering-hating civilizations on balance increases s-risk (mostly by introducing game-theoretic incentives) or decreases s-risk (by exerting their influence to prevent suffering, esp. via acausal trade). If the former, then x-risk and s-risk reduction may end up being aligned.
Did you mean to say, “if the latter” (such that x-risk and s-risk reduction are aligned when suffering-hating civilizations decrease s-risk), rather than “if the former”?
I feel a weird disconnect on reading comments like this. I thought s-risks were a part of conventional wisdom on here all along. (We even had an infamous scandal that concerned one class of such risks!) Scott didn’t “see it before the rest of us”—he was drawing on an existing, and by now classical, memeplex.
It’s like when some people spoke as if nobody had ever thought of AI risk until Bostrom wrote Superintelligence—even though that book just summarized what people (not least of whom Bostrom himself) had already been saying for years.
I guess I didn’t think about it carefully before. I assumed that s-risks were much less likely than x-risks (true) so it’s okay not to worry about them (false). The mistake was that logical leap.
In terms of utility, the landscape of possible human-built superintelligences might look like a big flat plain (paperclippers and other things that kill everyone without fuss), with a tall sharp peak (FAI) surrounded by a pit that’s astronomically deeper (many almost-FAIs and other designs that sound natural to humans). The pit needs to be compared to the peak, not the plain. If the pit is more likely, I’d rather have the plain.
I didn’t realize then that disutility of human-built AI can be much larger than utility of FAI, because pain is easier to achieve than human utility (which doesn’t reduce to pleasure). That makes the argument much stronger.
I didn’t realize then that disutility of human-built AI can be much larger than utility of FAI, because pain is easier to achieve than human utility (which doesn’t reduce to pleasure).
This argument doesn’t actually seem to be in the article that Kaj linked to. Did you see it somewhere else, or come up with it yourself? I’m not sure it makes sense, but I’d like to read more if it’s written up somewhere. (My objection is that “easier to achieve” doesn’t necessarily mean the maximum value achievable is higher. It could be that it would take longer or more effort to achieve the maximum value, but the actual maximums aren’t that different. For example, maybe the extra stuff needed for human utility (aside from pleasure) is complex but doesn’t actually cost much in terms of mass/energy.)
The argument somehow came to my mind yesterday, and I’m not sure it’s true either. But do you really think human value might be as easy to maximize as pleasure or pain? Pain is only about internal states, and human value seems to be partly about external states, so it should be way more expensive.
One of the more crucial points, I think, is that positive utility is – for most humans – complex and its creation is conjunctive. Disutility, in contrast, is disjunctive. Consequently, the probability of creating the former is smaller than the latter – all else being equal (of course, all else is not equal).
In other words, the scenarios leading towards the creation of (large amounts of) positive human value are conjunctive: to create a highly positive future, we have to eliminate (or at least substantially reduce) physical pain and boredom and injustice and loneliness and inequality (at least certain forms of it) and death, etc. etc. etc. (You might argue that getting “FAI” and “CEV” right would accomplish all those things at once (true) but getting FAI and CEV right is, of course, a highly conjunctive task in itself.)
In contrast, disutility is much more easily created and essentially disjunctive. Many roads lead towards dystopia: sadistic programmers or failing AI safety wholesale (or “only” value-loading or extrapolating, or stable self-modification), or some totalitarian regime takes over, etc. etc.
It’s also not a coincidence that even the most untalented writer with the most limited imagination can conjure up a convincing dystopian society. Envisioning a true utopia in concrete detail, on the other hand, is nigh impossible for most human minds.
“[...] human intuitions about what is valuable are often complex and fragile (Yudkowsky, 2011), taking up only a small area in the space of all possible values. In other words, the number of possible configurations of matter constituting anything we would value highly (under reflection) is arguably smaller than the number of possible configurations that constitute some sort of strong suffering or disvalue, making the incidental creation of the latter ceteris paribus more likely.”
Consequently, UFAIs such as paperclippers are more likely to create large amounts of disutility than utility (factoring out acausal considerations) incidentally (e.g. because creating simulations is instrumentally useful for them).
Generally, I like how you put it in your comment here:
In terms of utility, the landscape of possible human-built superintelligences might look like a big flat plain (paperclippers and other things that kill everyone without fuss), with a tall sharp peak (FAI) surrounded by a pit that’s astronomically deeper (many almost-FAIs and other designs that sound natural to humans). The pit needs to be compared to the peak, not the plain. If the pit is more likely, I’d rather have the plain.
Yeah. In a nutshell, supporting generic x-risk-reduction (which also reduces extinction risks) is in one’s best interest, if and only if one’s own normative trade-ratio of suffering vs. happiness is less suffering-focused than one’s estimate of the ratio of expected future happiness to suffering (feel free to replace “happiness” with utility and “suffering” with disutility). If one is more pessimistic about the future or if one needs large amounts of happiness to trade-off small amounts of suffering, one should rather focus on s-risk-reduction instead. Of course, this simplistic analysis leaves out issues like cooperation with others, neglectedness, tractability, moral uncertainty, acausal considerations, etc.
Yeah, I also had the idea about utility being conjunctive and mentioned it in a deleted reply to Wei, but then realized that Eliezer’s version (fragility of value) already exists and is better argued.
On the other hand, maybe the worst hellscapes can be prevented in one go, if we “just” solve the problem of consciousness and tell the AI what suffering means. We don’t need all of human value for that. Hellscapes without suffering can also be pretty bad in terms of human value, but not quite as bad, I think. Of course solving consciousness is still a very tall order, but it might be easier than solving all philosophy that’s required for FAI, and it can lead to other shortcuts like in my recent post (not that I’d propose them seriously).
Some people at MIRI might be thinking about this under nonperson predicate. (Eliezer’s view on which computations matter morally is different from the one endorsed by Brian, though.) And maybe it’s important to not limit FAI options too much by preventing mindcrime at all costs – if there are benefits against other very bad failure modes (or – cooperatively – just increased controllability for the people who care a lot about utopia-type outcomes), maybe some mindcrime in the early stages to ensure goal-alignment would be the lesser evil.
Human disutility includes more than just pain too. Destruction of the humanity (the flat plain you describe) carries a great deal of negative utility for me, even if I disappear without feeling any pain at all. There’s more disutility if all life is destroyed, and more if the universe as a whole is destroyed… I don’t think there’s any fundamental asymmetry. Pain and pleasure are the most immediate ways of affecting value, and probably the ones that can be achieved most efficiently in computronium, so external states probably don’t come into play much at all if you take a purely utilitarian view.
Our values might say, for example, that a universe filled with suffering insects is very undesirable, but a universe filled with happy insects isn’t very desirable. More generally, if our values are a conjunction of many different values, then it’s probably easier to create a universe where one is strongly negative and the rest are zero, than a universe where all are strongly positive. I haven’t seen the argument written up, I’m trying to figure it out now.
Huh, I feel very differently. For AI risk specifically, I thought the conventional wisdom was always “if AI goes wrong, the most likely outcome is that we’ll all just die, and the next most likely outcome is that we get a future which somehow goes against our values even if it makes us very happy.” And besides AI risk, other x-risks haven’t really been discussed at all on LW. I don’t recall seeing any argument for s-risks being a particularly plausible category of risks, let alone one of the most important ones.
It’s true that there was That One Scandal, but the reaction to that was quite literally Let’s Never Talk About This Again—or alternatively Let’s Keep Bringing This Up To Complain About How It Was Handled, depending on the person in question—but then people always only seemed to be talking about that specific incident and argument. I never saw anyone draw the conclusion that “hey, this looks like an important subcategory of x-risks that warrants separate investigation and dedicated work to avoid”.
I don’t recall seeing any argument for s-risks being a particularly plausible category of risks, let alone one of the most important ones.
There was some discussion back in 2012 and sporadically sincethen. (ETA: You can also do a search for “hell simulations” and get a bunch more results.)
I never saw anyone draw the conclusion that “hey, this looks like an important subcategory of x-risks that warrants separate investigation and dedicated work to avoid”.
I’ve always thought that in order to prevent astronomical suffering, we will probably want to eventually (i.e., after a lot of careful thought) build an FAI that will colonize the universe and stop any potential astronomical suffering arising from alien origins and/or try to reduce suffering in other universes via acausal trade etc., so the work isn’t very different from other x-risk work. But now that the x-risk community is larger, maybe it does make sense to split out some of the more s-risk specific work?
I’ve always thought that in order to prevent astronomical suffering, we will probably want to eventually (i.e., after a lot of careful thought) build an FAI that will colonize the universe and stop any potential astronomical suffering arising from alien origins and/or try to reduce suffering in other universes via acausal trade etc., so the work isn’t very different from other x-risk work.
It seems like the most likely reasons to create suffering come from the existence of suffering-hating civilizations. Do you think that it’s clear/very likely that it is net helpful for there to be more mature suffering-hating civilizations? (On the suffering-focused perspective.)
Do you think that it’s clear/very likely that it is net helpful for there to be more mature suffering-hating civilizations? (On the suffering-focused perspective.)
My intuition is that there is no point in trying to answer questions like these before we know a lot more about decision theory, metaethics, metaphilosophy, and normative ethics, so pushing for a future where these kinds of questions eventually get answered correctly (and the answers make a difference in what happens) seems like the most important thing to do. It doesn’t seem to make sense to try to lock in some answers (i.e., make our civilization suffering-hating or not suffering-hating) on the off chance that when we figure out what the answers actually are, it will be too late. Someone with much less moral/philosophical uncertainty than I do would perhaps prioritize things differently, but I find it difficult to motivate myself to think really hard from their perspective.
If we try to answer the question now, it seems very likely we’ll get the answer wrong (given my state of uncertainty about the inputs that go into the question). I want to keep civilization going until we know better how to answer these types of questions. For example if we succeed in building a correctly designed/implemented Singleton FAI, it ought to be able to consider this question at leisure, and if it becomes clear that the existence of mature suffering-hating civilizations actually causes more suffering to be created, then it can decide to not make us into a mature suffering-hating civilization, or take whatever other action is appropriate.
Are you worried that by the time such an FAI (or whatever will control our civilization) figures out the answer, it will be too late? (Why? If we can decide that x-risk reduction is bad, then so can it. If it’s too late to alter or end civilization at that point, why isn’t it already too late for us?) Or are you worried more that the question won’t be answered correctly by whatever will control our civilization?
If you are concerned exclusively with suffering, then increasing the number of mature civilizations is obviously bad and you’d prefer that the average civilization not exist. You might think that our descendants are particularly good to keep around, since we hate suffering so much. But in fact almost all s-risks occur precisely because of civilizations that hate suffering, so it’s not at all clear that creating “the civilization that we will become on reflection” is better than creating “a random civilization” (which is bad).
To be clear, even if we have modest amounts of moral uncertainty I think it could easily justify a “wait and see” style approach. But if we were committed to a suffering-focused view then I don’t think your argument works.
But in fact almost all s-risks occur precisely because of civilizations that hate suffering
It seems just as plausible to me that suffering-hating civilizations reduce the overall amount of suffering in the multiverse, so I think I’d wait until it becomes clear which is the case, even if I was concerned exclusively with suffering. But I haven’t thought about this question much, since I haven’t had a reason to assume an exclusive concern with suffering, until you started asking me to.
To be clear, even if we have modest amounts of moral uncertainty I think it could easily justify a “wait and see” style approach. But if we were committed to a suffering-focused view then I don’t think your argument works.
Earlier in this thread I’d been speaking from the perspective of my own moral uncertainty, not from a purely suffering-focused view, since we were discussing the linked article, and Kaj had written:
The article isn’t specifically negative utilitarian, though—even classical utilitarians would agree that having astronomical amounts of suffering is a bad thing. Nor do you have to be a utilitarian in the first place to think it would be bad: as the article itself notes, pretty much all major value systems probably agree on s-risks being a major Bad Thing
What’s your reason for considering a purely suffering-focused view? Intellectual curiosity? Being nice to or cooperating with people like Brian Tomasik by helping to analyze one of their problems?
Or are you worried more that the question won’t be answered correctly by whatever will control our civilization?
Perhaps this, in case it turns out to be highly important but difficult to get certain ingredients – e.g. priors or decision theory – exactly right. (But I have no idea, it’s also plausible that suboptimal designs could patch themselves well, get rescued somehow, or just have their goals changed without much fuss.)
That sort of subject is inherently implicit in the kind of decision-theoretic questions that MIRI-style AI research involves. More generally, when one is thinking about astronomical-scale questions, and aggregating utilities, and so on, it is a matter of course that cosmically bad outcomes are as much of a theoretical possibility as cosmically good outcomes.
Now, the idea that one might need to specifically think about the bad outcomes, in the sense that preventing them might require strategies separate from those required for achieving good outcomes, may depend on additional assumptions that haven’t been conventional wisdom here.
Now, the idea that one might need to specifically think about the bad outcomes, in the sense that preventing them might require strategies separate from those required for achieving good outcomes, may depend on additional assumptions that haven’t been conventional wisdom here.
Right, I took this idea to be one of the main contributions of the article, and assumed that this was one of the reasons why cousin_it felt it was important and novel.
Thanks for voicing this sentiment I had upon reading the original comment. My impression was that negative utilitarian viewpoints / things of this sort had been trending for far longer than cousin_it’s comment might suggest.
The article isn’t specifically negative utilitarian, though—even classical utilitarians would agree that having astronomical amounts of suffering is a bad thing. Nor do you have to be a utilitarian in the first place to think it would be bad: as the article itself notes, pretty much all major value systems probably agree on s-risks being a major Bad Thing:
All plausible value systems agree that suffering, all else being equal, is undesirable. That is, everyone agrees that we have reasons to avoid suffering. S-risks are risks of massive suffering, so I hope you agree that it’s good to prevent s-risks.
Decision theory (which includes the study of risks of that sort)
No, it doesn’t. Decision theory deals with abstract utility functions. It can talk about outcomes A, B, and C where A is preferred to B and B is preferred to C, but doesn’t care whether A represents the status quo, B represents death, and C represents extreme suffering, or whether A represents gaining lots of wealth and status, B represents the status quo, and C represents death, so long as the ratios of utility differences are the same in each case. Decision theory has nothing to do with the study of s-risks.
What Alex said doesn’t seem to refute or change what I said.
But also: I disagree with the parent. I take conventional wisdom here to include support for MIRI’s agent foundations agenda, which includes decision theory, which includes the study of such risks (even if only indirectly or implicitly).
Fair enough. I guess I didn’t think carefully about it before. I assumed that s-risks were much less likely than x-risks (true) and so they could be discounted (false). It seems like the right way to imagine the landscape of superintelligences is a vast flat plain (paperclippers and other things that kill everyone without fuss) with a tall thin peak (FAIs) surrounded by a pit that’s astronomically deeper (FAI-adjacent and other designs). The right comparison is between the peak and the pit, because if the pit is more likely, I’d rather have the plain.
I think the reason why cousin_it’s comment is upvoted so much is that a lot of people (including me) weren’t really aware of S-risks or how bad they could be. It’s one thing to just make a throwaway line that S-risks could be worse, but it’s another thing entirely to put together a convincing argument.
Similar ideas have been in other articles, but they’ve framed it in terms of energy-efficiency while defining weird words such as computronium or the two-envelopes problem, which make it much less clear. I don’t think I saw the links for either of those articles before, but if I had, I probably wouldn’t have read them.
I also think that the title helps as well. S-risks is a catchy name, especially if you already know x-risks. I know that this term has been used before, but it wasn’t used in the title. Further, while being quite a good article, you can read the summary, introduction and conclusion without encountering the idea that the author believes that s-risks are much greater than x-risks, as opposed to being just yet another risk to worry about.
I think there’s definitely an important lesson to be drawn here. I wonder how many other articles have gotten close to an important truth, but just failed to hit it out fo the park for some reason or another.
Further, while being quite a good article, you can read the summary, introduction and conclusion without encountering the idea that the author believes that s-risks are much greater than x-risks, as opposed to being just yet another risk to worry about.
I’m only confident about endorsing this conclusion conditional on having values where reducing suffering matters a great deal more than promoting happiness. So we wrote the “Reducing risks of astronomical suffering” article in a deliberately ‘balanced’ way, pointing out the different perspectives. This is why it didn’t come away making any very strong claims. I don’t find the energy-efficiency point convincing at all, but for those who do, x-risks are likely (though not with very high confidence) still more important, mainly because more futures will be optimized for good outcomes rather than bad outcomes, and this is where most of the value is likely to come from. The “pit” around the FAI-peak is in expectation extremely bad compared to anything that exists currently, but most of it is just accidental suffering that is still comparatively unoptimized. So in the end, whether s-risks or x-risks are more important to work on on the margin depends on how suffering-focused or not someone’s values are.
Having said that, I totally agree that more people should be concerned about s-risks and it’s concerning that the article (and the one on suffering-focused AI safety) didn’t manage to convey this point well.
It seems like an s-risk outcome (even one that keeps some people happy) could be more than a million times worse than an x-risk outcome, while not being a million times more improbable, so focusing on s-risks is correct.
And the concept is much older than that. The 2011 Felicifia post “A few dystopic future scenarios” by Brian Tomasik outlined many of the same considerations that FRI works on today (suffering simulations, etc.), and of course Brian has been blogging about risks of astronomical suffering since then. FRI itself was founded in 2013.
Iain Banks’ Surface Detail published in 2010 featured a war over the existence of virtual hells (simulations constructed explicitly to punish the ems of sinners).
The only counterarguments I can think of would be:
The claim that the likelihood of s-risks being close to that of x-risks seems not well argued to me. In particular, conflict seems to be the most plausible scenario (and one which has a high prior placed on it as we can observe that much suffering today is caused by conflict), but it seems to be less and less likely of a scenario once you factor in superintelligence, as multi-polar scenarios seem to be either very short-lived or unlikely to happen at all.
We should be wary of applying anthropomorphic traits to hypothetical artificial agents in the future. Pain in biological organisms may very well have evolved as a proxy to negative utility, and might not be necessary in “pure” agent intelligences which can calculate utility functions directly. It’s not obvious to me that implementing suffering in the sense that humans understand it would be cheaper or more efficient for a superintelligence to do instead of simply creating utility-maximizers when it needs to produce a large number of sub-agents.
High overlap between approaches to mitigating x-risk and approaches to mitigating s-risks. If the best chance of mitigating future suffering is trying to bring about a friendly artificial intelligence explosion, then it seems that the approaches we are currently taking should still be the correct ones.
More speculatively: If we focus heavily on s-risks, does this open us up to issues regarding utility-monsters? Can I extort people by creating a simulation of trillions of agents and then threaten to minimize their utility? (If we simply value the sum of utility, and not necessarily the complexity of the agent having the utility, then this should be relatively cheap to implement).
I think the most general response to your first three points would look something like this: Any superintelligence that achieves human values will be adjacent in design space to many superintelligences that cause massive suffering, so it’s quite likely that the wrong superintelligence will win, due to human error, malice, or arms races.
As to your last point, it looks more like a research problem than a counterargument, and I’d be very interested in any progress on that front :-)
So being served a cup of coffee and being served a cup of pure capsaicin are “adjacent in design space”? Maybe, but funny how that problem doesn’t arise or even worry anyone...
That’s a twist on a standard LW argument, see e.g. here:
Fragility of value is the thesis that losing even a small part of the rules that make up our values could lead to results that most of us would now consider as unacceptable
It seems to me that fragility of value can lead to massive suffering in many ways.
You’re basically dialing that argument up to eleven. From “losing a small part could lead to unacceptable results” you are jumping to “losing any small part will lead to unimaginable hellscapes”:
with a tall sharp peak (FAI) surrounded by a pit that’s astronomically deeper
Yeah, not all parts. But even if it’s a 1% chance, one hellscape might balance out a hundred universes where FAI wins. Pain is just too effective at creating disutility. I understand why people want to be optimistic, but I think being pessimistic in this case is more responsible.
So basically you are saying that the situation is asymmetric: the impact/magnitude of possible bad things is much much greater than the impact/magnitude of possible good things. Is this correct?
Yeah. One sign of asymmetry is that creating two universes, one filled with pleasure and the other filled with pain, feels strongly negative rather than symmetric to us. Another sign is that pain is an internal experience, while our values might refer to the external world (though it’s very murky), so the former might be much easier to achieve. Another sign is that in our world it’s much easier to create a life filled with pain than a life that fulfills human values.
Yes, many people intuitively feel that a universe of pleasure and a universe of pain add to a net negative. But I suspect that’s just a result of experiencing (and avoiding) lots of sources of extreme pain in our lives, while sources of pleasure tend to be diffuse and relatively rare. The human experience of pleasure is conjunctive because in order to survive and reproduce you must fairly reliably avoid all types of extreme pain. But in a pleasure-maximizing environment, removing pain will be a given.
It’s also true that our brains tend to adapt to pleasure over time, but that seems simple to modify once physiological constraints are removed.
“one filled with pleasure and the other filled with pain, feels strongly negative rather than symmetric to us”
Comparing pains and pleasures of similar magnitude? People have a tendency not to do this, see the linked thread.
“Another sign is that pain is an internal experience, while our values might refer to the external world (though it’s very murky”
You accept pain and risk of pain all the time to pursue various pleasures, desires and goals. Mice will cross electrified surfaces for tastier treats.
If you’re going to care about hedonic states as such, why treat the external case differently?
Alternatively, if you’re going to dismiss pleasure as just an indicator of true goals (e.g. that pursuit of pleasure as such is ‘wireheading’) then why not dismiss pain in the same way, as just a signal and not itself a goal?
Comparing pains and pleasures of similar magnitude?
My point was comparing pains and pleasures that could be generated with similar amount of resources. Do you think they balance out for human decision making? For example, I’d strongly disagree to create a box of pleasure and a box of pain, do you think my preference would go away after extrapolation?
“My point was comparing pains and pleasures that could be generated with similar amount of resources. Do you think they balance out for human decision making?”
I think with current tech it’s cheaper and easier to wirehead to increase pain (i.e. torture) than to increase pleasure or reduce pain. This makes sense biologically, since organisms won’t go looking for ways to wirehead to maximize their own pain, evolution doesn’t need to ‘hide the keys’ as much as with pleasure or pain relief (where the organism would actively seek out easy means of subverting the behavioral functions of the hedonic system). Thus when powerful addictive drugs are available, such as alcohol, human populations evolve increased resistance over time. The sex systems evolve to make masturbation less rewarding than reproductive sex under ancestral conditions, desire for play/curiosity is limited by boredom, delicious foods become less pleasant when full or the foods are not later associated with nutritional sensors in the stomach, etc.
I don’t think this is true with fine control over the nervous system (or a digital version) to adjust felt intensity and behavioral reinforcement. I think with that sort of full access one could easily increase the intensity (and ease of activation) of pleasures/mood such that one would trade them off against the most intense pains at ~parity per second, and attempts at subjective comparison when or after experiencing both would put them at ~parity.
People will willingly undergo very painful jobs and undertakings for money, physical pleasures, love, status, childbirth, altruism, meaning, etc. Unless you have a different standard for the ‘boxes’ than used in subjective comparison with rich experience of the things to be compared I think we just haggling over the price re intensity.
We know the felt caliber and behavioral influence of such things can vary greatly. It would be possible to alter nociception and pain receptors to amp up or damp down any particular pain. This could even involve adding a new sense, e.g. someone with congenital deafness could be given the ability to hear (installing new nerves and neurons), and hear painful sounds, with artificially set intensity of pain. Likewise one could add a new sense (or dial one up) to enable stronger pleasures. I think that both the new pains and new pleasures would ‘count’ to the same degree (and if you’re going to dismiss the pleasures as ‘wireheading’ then you should dismiss the pains too).
″ For example, I’d strongly disagree to create a box of pleasure and a box of pain, do you think my preference would go away after extrapolation?”
You trade off pain and pleasure in your own life, are you saying that the standard would be different for the boxes than for yourself?
What are you using as the examples to represent the boxes, and have you experienced them? (As discussed in my link above, people often use weaksauce examples in such comparison.)
We could certainly make agents for whom pleasure and pain would use equal resources per util. The question is if human preferences today (or extrapolated) would sympathize with such agents to the point of giving them the universe. Their decision-making could look very inhuman to us. If we value such agents with a discount factor, we’re back at square one.
That’s what the congenital deafness discussion was about.
You have preferences over pain and pleasure intensities that you haven’t experienced, or new durations of experiences you know. Otherwise you wouldn’t have anything to worry about re torture, since you haven’t experienced it.
Pain asymbolia is a condition in which pain is perceived, but with an absence of the suffering that is normally associated with the pain experience. Individuals with pain asymbolia still identify the stimulus as painful but do not display the behavioral or affective reactions that usually accompany pain; no sense of threat and/or danger is precipitated by pain.
Suppose you currently had pain asymbolia. Would that mean you wouldn’t object to pain and suffering in non-asymbolics? What if you personally had only happened to experience extremely mild discomfort while having lots of great positive experiences? What about for yourself? If you knew you were going to get a cure for your pain asymbolia tomorrow would you object to subsequent torture as intrinsically bad?
We can go through similar stories for major depression and positive mood.
Seems it’s the character of the experience that matters.
Likewise, if you’ve never experienced skiing, chocolate, favorite films, sex, victory in sports, and similar things that doesn’t mean you should act as though they have no moral value. This also holds true for enhanced experiences and experiences your brain currently is unable to have, like the case of congenital deafness followed by a procedure to grant hearing and listening to music.
Music and chocolate are known to be mostly safe. I guess I’m more cautious about new self-modifications that can change my decisions massively, including decisions about more self-modifications. It seems like if I’m not careful, you can devise a sequence that will turn me into a paperclipper. That’s why I discount such agents for now, until I understand better what CEV means.
conflict seems to be the most plausible scenario (and one which has a high prior placed on it as we can observe that much suffering today is caused by conflict), but it seems to be less and less likely of a scenario once you factor in superintelligence, as multi-polar scenarios seem to be either very short-lived or unlikely to happen at all.
This seems plausible but not obvious to me. Humans are superintelligent as compared to chimpanzees (let alone, say, Venus flytraps), but humans have still formed a multipolar civilization.
When thinking about whether s-risk scenarios are tied to or come about by similar means as x-risk scenarios (such as a malign intelligence explosion), the relevant issue to me seems to be whether or not such a scenario could result in a multi-polar conflict of cosmic proportions. I think the chance of that happening is quite low, since intelligence explosions seem to be most likely to result in a singleton.
Due to complexity and fragility of human values, any superintelligence that fulfills them will probably be adjacent in design space to many other superintelligences that cause lots of suffering (which is also much cheaper), so a wrong superintelligence might take over due to human error or malice or arms races. That’s where most s-risk is coming from, I think. The one in a million number seems optimistic, actually.
I agree that preventing s-risks is important, but I will try to look on possible counter arguments:
Benevolent AI will able to fight acasual war against evil AI in the another branch of the multiverse by creating more my happy copies, or more paths from suffering observer-moment to happy observer-moment. So creating benevolent superintelligence will help against suffering everywhere in the multiverse.
Non-existence is the worst form of suffering if we define suffering as action against our most important value. Thus x-risks are s-risks. Pain is not always suffering, as masochists exist.
If we value too much attention to animal suffering, we give ground to projects like Voluntary human extinction movement. So we increase chances of human extinction, as humans created animal farms. Moreover, if we agree that non-existence is not suffering, we could kill all life on earth and stop all sufferings—which is not right.
Benevolent AI will able to resurrect all possible sentient beings and animals and provide them infinite paradise thus compensating any current suffering of animals.
Only infinite and unbearable suffering are bad. We should distinguish unbearable sufferings like agony, and ordinary sufferings which just reinforcement learning signals for wetware of our brain and inform us about the past wrong decisions or the need to call a doctor.
I think longer explanation is needed to show how benevolent AI will save observers from evil AI. It is not just compensation for sufferings. It is based on the idea of the indexical uncertainty of equal observers. If two equal observers-moments exist, he doesn’t know, which one them he is. So a benevolent AI creates 1000 copies of an observer-moment which is in jail of evil AI, and construct to each copy pleasant next moment. From the point of view of the jailed observer-moment, there will be 1001 expected future moments for him, and only 1 of them will consist of continued sufferings. So expected duration of his suffering will be less than a second. However, to win such game benevolent AI need to have the overwhelming advantage in computer power and some other assumptions about nature of personal identity need to be resolved.
I agree that some outcomes, like eternal very strong suffering are worse, but it is important to think about non-existence as a form of suffering, as it will help us in utilitarian calculations and will help to show that x-risks are the type of s-risks.
There more people in the world who care about animal sufferings than about x-risks, and giving them new argument increases the probability of x-risks.
What do you mean by “Also it’s about animals for some reason, let’s talk about them when hell freezes over.”? We could provide happiness to all animals and provide infinitely survival to their species, which otherwise will extinct completely in millions years.
Do you mean finite, but unbearable sufferings, like intensive pain for one year?
EDITED: It looks like you changed your long reply while I was writing the long answer on all your counterarguments.
I think my own rough future philosophy is making sure that the future has an increase in autonomy for humanity. I think it transforms into S-risk reduction assuming that autonomous people will chose to reduce their suffering and their potential future suffering if they can. It also transforms the tricky philosophical question of defining suffering into the tricky philosophical question of defining autonomy, that might be trade that is preferred.
I think I prefer the autonomy increase because I do not have to try and predict the emotional reaction of humans/agents to events. People could claim immense suffering to seeing me wearing bright/clashing clothing. But if I leave them and their physical environment alone I’m not decreasing their autonomy. It also suggests positive things to do (Give directly) rather than just avoidance of low autonomy outcomes.
There is however tension between the individual increase in autonomy and the increase in autonomy of the society.
So I don’t have much experience with philosophy; this is mainly a collection of my thoughts as I read through.
1) S-risks seem to basically describe hellscapes, situations of unimaginable suffering. Is that about right?
2) Two assumptions here seem to be valuing future sentience and the additive nature of utility/suffering. Are these typical stances to be taking? Should there be some sort of discounting happening here?
3) I’m pretty sure I’m strawmanning here, but I can’t but feel like there’s some sort of argument by definition here where we first defined s-risks as the worst things possible, then concluded that we should work on them because EAs might want to avert the worst things possible. It seems...a little vacuous?
4) In UNSONG, someone mentioned that Thamiel is basically an anti-Friendly AI in that he’s roughly the inverse of our human values. That is, actual Unfriendliness (i.e. designed to maximize suffering) seem to be subtly encoding a lot of dense information about human suffering much the same way that Friendliness does. So I guess I’m trying to say that causing s-risks to happen actually seems to be a pretty hard problem, at least one that requires far more nuanced models than merely extinction.
In the case of AI going wrong, I currently find it far more plausible that extinction happens, rather than a hellscape scenario. It seems to me that we’d need to get like 90% of the way to alignment and then take a sharp turn for s-risks to happen, and given that we haven’t really made much substantial progress in alignment, I guess I’m unconvinced?
5) Oh, wait, looks like you covered point 4 about 3/4ths of the way down the page.
6) Additional arguments for s-risks seem to be based upon suffering of other potential sapient beings we create. I haven’t read Tomasik’s stuff, so I can’t say that much here, except that it seems to me that sapience might not equal capacity for suffering?
7) Your conclusion seems a little strong. I agree that conflicts can cause localized suffering (e.g. torturing people during wartime), but the arguments seem to rest quite a bit on proposed future sentient beings, which, I dunno, don’t seem as imminent? (For context, I’m worried about x-risks because projections in the next 100 years spread across things like climate change and more unpredictable things like AI paint a fairly bleak picture.)
8) I’m just noting that, should any x-risk come to pass, this solves the s-risk problem for humans / things related to humans. But there could just as well be sapient aliens suffering elsewhere, I guess.
causing s-risks to happen actually seems to be a pretty hard problem, at least one that requires far more nuanced models than merely extinction.
To maximize human suffering per unit of space-time, you need a good model of human values, just like a Friendly AI.
But to create astronomical amount of human suffering (without really maximizing it), you only need to fill astronomical amount of space-time with humans living in bad conditions, and prevent them from escaping those conditions. Relatively easier.
Instead of Thamiel, imagine immortal Pol Pot with space travel.
We could also add a-risks: that human civilisation will destroy alien life and alien civilizations. For example, LHC-false vacuum-catastrophe or UFAI could dangerously affect all visible universe and kill an unknown number of the alien civilisations or prevent their existence.
Preventing risks to alien life is one of the main efforts in the sterilisation of Mars rovers and sinking of Galileo and Cassini in Jupiter and Saturn after the end of their missions.
The flip side of this idea is “cosmic rescue missions” (term coined by David Pearce), which refers to the hypothetical scenario in which human civilization help to reduce the suffering of sentient extraterrestrials (in the original context, it referred to the use of technology to abolish suffering). Of course, this is more relevant for simple animal-like aliens and less so for advanced civilizations, which would presumably have already either implemented a similar technology or decided to reject such technology. Brian Tomasik argues that cosmic rescue missions are unlikely.
Also, there’s an argument that humanity conquering aliens civs would only be considered bad if you assume that either (1) we have non-universalist-consequentialist reasons to believe that preventing alien civilizations from existing is bad, or (2) the alien civilization would produce greater universalist-consequentialist value than human civilizations with the same resources. If (2) is the case, then humanity should actually be willing to sacrifice itself to let the aliens take over (like in the “utility monster” thought experiment), assuming that universalist consequentialism is true. If neither (1) nor (2) holds, then human civilization would have greater value than ET civilization. Seth Baum’s paper on universalist ethics and alien encounters goes into greater detail.
Thanks for links. My thought was that we may give higher negative utility to those x-risks which are able to become a-risks too, that is LHC and AI.
If you know Russian science fiction by Strugatsky, there is an idea in it of “Progressors”—the people who are implanted into other civilisations to help them develop quickly. At the end, the main character concluded that such actions violate value of any civilization to determine their own way and he returned to earth to search and stop possible alien progressors on here.
Iain Banks has similar themes in his books—e.g. Inversions. And generally speaking, in the Culture universe, the Special Circumstances are a meddlesome bunch.
Feedback: I had to scroll a very long way until I found out what “s-risk” even was. By then I had lost interest, mainly because generalizing from fiction is not useful.
Thank you for your feedback. I’ve added a paragraph at the top of the post that includes the definition of s-risk and refers readers already familiar with the concept to another article.
Thanks for the feedback! The first sentence below the title slide says: “I’ll talk about risks of severe suffering in the far future, or s-risks.” Was this an insufficient definition for you? Would you recommend a different definition?
Is it still a facepalm given the rest of the sentence? “So, s-risks are roughly as severe as factory farming, but with an even larger scope.” The word “severe” is being used in a technical sense (discussed a few paragraphs earlier) to mean something like “per individual badness” without considering scope.
I think the claim that s-risks are roughly as severe as factory farming “per individual badness” is unsubstantiated. But it is reasonable to claim that experiencing either would be worse than death, “hellish”. Remember, Hell has circles.
The section presumes that the audience agrees wrt veganism. To an audience who isn’t on board with EA veganism, that line comes across as the “arson, murder, and jaywalking” trope.
Notably, the great majority of them don’t have the slightest clue about farming in general or factory farming in particular. Don’t mistake social signaling for actual positions.
As the expression about knowing “how the sausage is made” attests, generally the more people learn about it, the less they like it.
Of course, veganism is very far from being an immediate consequence of disliking factory farming. (Similarly, refusing to pay taxes is very far from being an immediate consequence of disliking government policy.)
As the expression about knowing “how the sausage is made” attests, generally the more people learn about it, the less they like it.
That’s not obvious to me.
I agree that the more people are exposed to anti-factory-farming propaganda, the more they are influenced by it, but that’s not quite the same thing, is it?
Facepalm was a severe understatement, this quote is a direct ticket to the loony bin. I recommend poking your head out of the bubble once in a while—it’s a whole world out there. For example, some horrible terrible no-good people—like me—consider factory farming to be an efficient way of producing a lot of food at reasonable cost.
This sentence reads approximately as “Literal genocide (e.g. Rwanda) is roughly as severe as using a masculine pronoun with respect to a nonspecific person, but with an even larger scope”.
The steeliest steelman that I can come up with is that you’re utterly out of touch with the Normies.
I sympathize with your feeling of alienation at the comment, and thanks for offering this perspective that seems outlandish to me. I don’t think I agree with you re who the ‘normies’ are, but I suspect that this may not be a fruitful thing to even argue about.
Side note: I’m reminded of the discussion here. (It seems tricky to find a good way to point out that other people are presenting their normative views in a way that signals an unfair consensus, without getting into/accused of identify politics or having to throw around words like “loony bin” or fighting over who the ‘normies’ are.)
Yes, we clearly have very different worldviews. I don’t think alienation is the right word here, it’s just that different people think about the world differently and IMHO that’s perfectly fine (to clarify, I mean values and normative statements, not facts). And, of course, you have no obligation at all to do something about it.
If it makes sense to continue adding letters to different risks, l-risks could be identified, that is the risks that kill all life on earth. The main difference for us, humans, that there are zero chances of the new civilisation of Earth in that case.
Wow!
Many thanks for posting that link. It’s clearly the most important thing I’ve read on LW in a long time, I’d upvote it ten times if I could.
It seems like an s-risk outcome (even one that keeps some people happy) could be more than a million times worse than an x-risk outcome, while not being a million times more improbable, so focusing on s-risks is correct. The argument wasn’t as clear to me before. Does anyone have good counterarguments? Why shouldn’t we all focus on s-risk from now on?
(Unsong had a plot point where Peter Singer declared that the most important task for effective altruists was to destroy Hell. Big props to Scott for seeing it before the rest of us.)
I don’t buy the “million times worse,” at least not if we talk about the relevant E(s-risk moral value) / E(x-risk moral value) rather than the irrelevant E(s-risk moral value / x-risk moral value). See this post by Carl and this post by Brian. I think that responsible use of moral uncertainty will tend to push you away from this kind of fanatical view
I agree that if you are million-to-1 then you should be predominantly concerned with s-risk, I think they are somewhat improbable/intractable but not that improbable+intractable. I’d guess the probability is ~100x lower, and the available object-level interventions are perhaps 10x less effective. The particular scenarios discussed here seem unlikely to lead to optimized suffering, only “conflict” and ”???” really make any sense to me. Even on the negative utilitarian view, it seems like you shouldn’t care about anything other than optimized suffering.
The best object-level intervention I can think of is reducing our civilization’s expected vulnerability to extortion, which seems poorly-leveraged relative to alignment because it is much less time-sensitive (unless we fail at alignment and so end up committing to a particular and probably mistaken decision-theoretic perspective). From the perspective of s-riskers, it’s possible that spreading strong emotional commitments to extortion-resistance (e.g. along the lines of UDT or this heuristic) looks somewhat better than spreading concern for suffering.
The meta-level intervention of “think about s-risk and understand it better / look for new interventions” seems much more attractive than any object-level interventions we yet know, and probably worth investing some resources in even if you take a more normal suffering vs. pleasure tradeoff. If this is the best intervention and is much more likely to be implemented by people who endorse suffering-focused ethical views, it may be the strongest incentive to spread suffering-focused views. I think that higher adoption of suffering-focused views is relatively bad for people with a more traditional suffering vs. pleasure tradeoff, so this is something I’d like to avoid (especially given that suffering-focused ethics seems to somehow be connected with distrust of philosophical deliberation). Ironically, that gives some extra reason for conventional EAs to think about s-risk, so that the suffering-focused EAs have less incentive to focus on value-spreading. This also seems like an attractive compromise more broadly: we all spend a bit of time thinking about s-risk reduction and taking the low-hanging fruit, and suffering-focused EAs do less stuff that tends to lead to the destruction of the world. (Though here the non-s-riskers should also err on the side of extortion-resistance, e.g. trading with the position of rational non-extorting s-riskers rather than whatever views/plans the s-riskers happen to have.)
An obvious first question is whether the existence of suffering-hating civilizations on balance increases s-risk (mostly by introducing game-theoretic incentives) or decreases s-risk (by exerting their influence to prevent suffering, esp. via acausal trade). If the latter, then x-risk and s-risk reduction may end up being aligned. If the former, then at best the s-riskers are indifferent to survival and need to resort to more speculative interventions. Interestingly, in this case it may also be counterproductive for s-riskers to expand their influence or acquire resources. My guess is that mature suffering-hating civilizations reduce s-risk, since immature suffering-hating civilizations probably provide a significant part of the game-theoretic incentive yet have almost no influence, and sane suffering-hating civilizations will provide minimal additional incentives to create suffering. But I haven’t thought about this issue very much.
Paul, thank you for the substantive comment!
Carl’s post sounded weird to me, because large amounts of human utility (more than just pleasure) seem harder to achieve than large amounts of human disutility (for which pain is enough). You could say that some possible minds are easier to please, but human utility doesn’t necessarily value such minds enough to counterbalance s-risk.
Brian’s post focuses more on possible suffering of insects or quarks. I don’t feel quite as morally uncertain about large amounts of human suffering, do you?
As to possible interventions, you have clearly thought about this for longer than me, so I’ll need time to sort things out. This is quite a shock.
Carl gave a reason that future creatures, including potentially very human-like minds, might diverge from current humans in a way that makes hedonium much more efficient. If you assigned significant probability to that kind of scenario, it would quickly undermine your million-to-one ratio. Brian’s post briefly explains why you shouldn’t argue “If there is a 50% chance that x-risks are 2 million times worse, than they are a million times worse in expectation.” (I’d guess that there is a good chance, say > 25%, that good stuff can be as efficient as bad stuff.)
I would further say: existing creatures often prefer to keep living even given the possibility of extreme pain. This can be easily explained by an evolutionary story, which suffering-focused utilitarians tend to view as a debunking explanation: given that animals would prefer keep living regardless of the actual balance of pleasure and pain, we shouldn’t infer anything from that preference. But our strong dispreference for intense suffering has a similar evolutionary origin, and is no more reflective of underlying moral facts than is our strong preference for survival.
In support of this, my system 1 reports that if it sees more intelligent people taking S-risk seriously it is less likely to nuke the planet if it gets the chance. (I’m not sure I endorse nuking the planet, just reporting emotional reaction).
Can you elaborate on what you mean by this? People like Brian or others at FRI don’t seem particularly averse to philosophical deliberation to me...
I support this compromise and agree not to destroy the world. :-)
Those of us who sympathize with suffering-focused ethics have an incentive to encourage others to think about their values now, at least in crudely enough terms to take a stance on prioritizing preventing s-risks vs. making sure we get to a position where everyone can safely deliberate their values further and then everything gets fulfilled. Conversely, if one (normatively!) thinks the downsides of bad futures are unlikely to be much worse than the upsides of good futures, then one is incentivized to promote caution about taking confident stances on anything population-ethics-related, and instead value deeper philosophical reflection. The latter also has the upside of being good from a cooperation point of view: Everyone can work on the same priority (building safe AI that helps with philosophical reflection) regardless of one’s inklings about how personal value extrapolation is likely to turn out.
(The situation becomes more interesting/complicated for suffering-focused altruists once we add considerations of multiverse-wide compromise via coordinated decision-making, which, in extreme versions at least, would call for being “updateless” about the direction of one’s own values.)
People vary in what kinds of values change they would consider drift vs. endorsed deliberation. Brian has in the past publicly come down unusually far on the side of “change = drift,” I’ve encountered similar views on one other occasion from this crowd, and I had heard second hand that this was relatively common.
Brian or someone more familiar with his views could speak more authoritatively to that aspect of the question, and I might be mistaken about the views of the suffering-focused utilitarians more broadly.
Did you mean to say, “if the latter” (such that x-risk and s-risk reduction are aligned when suffering-hating civilizations decrease s-risk), rather than “if the former”?
I feel a weird disconnect on reading comments like this. I thought s-risks were a part of conventional wisdom on here all along. (We even had an infamous scandal that concerned one class of such risks!) Scott didn’t “see it before the rest of us”—he was drawing on an existing, and by now classical, memeplex.
It’s like when some people spoke as if nobody had ever thought of AI risk until Bostrom wrote Superintelligence—even though that book just summarized what people (not least of whom Bostrom himself) had already been saying for years.
I guess I didn’t think about it carefully before. I assumed that s-risks were much less likely than x-risks (true) so it’s okay not to worry about them (false). The mistake was that logical leap.
In terms of utility, the landscape of possible human-built superintelligences might look like a big flat plain (paperclippers and other things that kill everyone without fuss), with a tall sharp peak (FAI) surrounded by a pit that’s astronomically deeper (many almost-FAIs and other designs that sound natural to humans). The pit needs to be compared to the peak, not the plain. If the pit is more likely, I’d rather have the plain.
Was it obvious to you all along?
Didn’t you realize this yourself back in 2012?
I didn’t realize then that disutility of human-built AI can be much larger than utility of FAI, because pain is easier to achieve than human utility (which doesn’t reduce to pleasure). That makes the argument much stronger.
This argument doesn’t actually seem to be in the article that Kaj linked to. Did you see it somewhere else, or come up with it yourself? I’m not sure it makes sense, but I’d like to read more if it’s written up somewhere. (My objection is that “easier to achieve” doesn’t necessarily mean the maximum value achievable is higher. It could be that it would take longer or more effort to achieve the maximum value, but the actual maximums aren’t that different. For example, maybe the extra stuff needed for human utility (aside from pleasure) is complex but doesn’t actually cost much in terms of mass/energy.)
The argument somehow came to my mind yesterday, and I’m not sure it’s true either. But do you really think human value might be as easy to maximize as pleasure or pain? Pain is only about internal states, and human value seems to be partly about external states, so it should be way more expensive.
One of the more crucial points, I think, is that positive utility is – for most humans – complex and its creation is conjunctive. Disutility, in contrast, is disjunctive. Consequently, the probability of creating the former is smaller than the latter – all else being equal (of course, all else is not equal).
In other words, the scenarios leading towards the creation of (large amounts of) positive human value are conjunctive: to create a highly positive future, we have to eliminate (or at least substantially reduce) physical pain and boredom and injustice and loneliness and inequality (at least certain forms of it) and death, etc. etc. etc. (You might argue that getting “FAI” and “CEV” right would accomplish all those things at once (true) but getting FAI and CEV right is, of course, a highly conjunctive task in itself.)
In contrast, disutility is much more easily created and essentially disjunctive. Many roads lead towards dystopia: sadistic programmers or failing AI safety wholesale (or “only” value-loading or extrapolating, or stable self-modification), or some totalitarian regime takes over, etc. etc.
It’s also not a coincidence that even the most untalented writer with the most limited imagination can conjure up a convincing dystopian society. Envisioning a true utopia in concrete detail, on the other hand, is nigh impossible for most human minds.
Footnote 10 of the above mentioned s-risk-static makes a related point (emphasis mine):
“[...] human intuitions about what is valuable are often complex and fragile (Yudkowsky, 2011), taking up only a small area in the space of all possible values. In other words, the number of possible configurations of matter constituting anything we would value highly (under reflection) is arguably smaller than the number of possible configurations that constitute some sort of strong suffering or disvalue, making the incidental creation of the latter ceteris paribus more likely.”
Consequently, UFAIs such as paperclippers are more likely to create large amounts of disutility than utility (factoring out acausal considerations) incidentally (e.g. because creating simulations is instrumentally useful for them).
Generally, I like how you put it in your comment here:
Yeah. In a nutshell, supporting generic x-risk-reduction (which also reduces extinction risks) is in one’s best interest, if and only if one’s own normative trade-ratio of suffering vs. happiness is less suffering-focused than one’s estimate of the ratio of expected future happiness to suffering (feel free to replace “happiness” with utility and “suffering” with disutility). If one is more pessimistic about the future or if one needs large amounts of happiness to trade-off small amounts of suffering, one should rather focus on s-risk-reduction instead. Of course, this simplistic analysis leaves out issues like cooperation with others, neglectedness, tractability, moral uncertainty, acausal considerations, etc.
Do you think that makes sense?
Yeah, I also had the idea about utility being conjunctive and mentioned it in a deleted reply to Wei, but then realized that Eliezer’s version (fragility of value) already exists and is better argued.
On the other hand, maybe the worst hellscapes can be prevented in one go, if we “just” solve the problem of consciousness and tell the AI what suffering means. We don’t need all of human value for that. Hellscapes without suffering can also be pretty bad in terms of human value, but not quite as bad, I think. Of course solving consciousness is still a very tall order, but it might be easier than solving all philosophy that’s required for FAI, and it can lead to other shortcuts like in my recent post (not that I’d propose them seriously).
Some people at MIRI might be thinking about this under nonperson predicate. (Eliezer’s view on which computations matter morally is different from the one endorsed by Brian, though.) And maybe it’s important to not limit FAI options too much by preventing mindcrime at all costs – if there are benefits against other very bad failure modes (or – cooperatively – just increased controllability for the people who care a lot about utopia-type outcomes), maybe some mindcrime in the early stages to ensure goal-alignment would be the lesser evil.
Human disutility includes more than just pain too. Destruction of the humanity (the flat plain you describe) carries a great deal of negative utility for me, even if I disappear without feeling any pain at all. There’s more disutility if all life is destroyed, and more if the universe as a whole is destroyed… I don’t think there’s any fundamental asymmetry. Pain and pleasure are the most immediate ways of affecting value, and probably the ones that can be achieved most efficiently in computronium, so external states probably don’t come into play much at all if you take a purely utilitarian view.
Our values might say, for example, that a universe filled with suffering insects is very undesirable, but a universe filled with happy insects isn’t very desirable. More generally, if our values are a conjunction of many different values, then it’s probably easier to create a universe where one is strongly negative and the rest are zero, than a universe where all are strongly positive. I haven’t seen the argument written up, I’m trying to figure it out now.
Huh, I feel very differently. For AI risk specifically, I thought the conventional wisdom was always “if AI goes wrong, the most likely outcome is that we’ll all just die, and the next most likely outcome is that we get a future which somehow goes against our values even if it makes us very happy.” And besides AI risk, other x-risks haven’t really been discussed at all on LW. I don’t recall seeing any argument for s-risks being a particularly plausible category of risks, let alone one of the most important ones.
It’s true that there was That One Scandal, but the reaction to that was quite literally Let’s Never Talk About This Again—or alternatively Let’s Keep Bringing This Up To Complain About How It Was Handled, depending on the person in question—but then people always only seemed to be talking about that specific incident and argument. I never saw anyone draw the conclusion that “hey, this looks like an important subcategory of x-risks that warrants separate investigation and dedicated work to avoid”.
There was some discussion back in 2012 and sporadically since then. (ETA: You can also do a search for “hell simulations” and get a bunch more results.)
I’ve always thought that in order to prevent astronomical suffering, we will probably want to eventually (i.e., after a lot of careful thought) build an FAI that will colonize the universe and stop any potential astronomical suffering arising from alien origins and/or try to reduce suffering in other universes via acausal trade etc., so the work isn’t very different from other x-risk work. But now that the x-risk community is larger, maybe it does make sense to split out some of the more s-risk specific work?
It seems like the most likely reasons to create suffering come from the existence of suffering-hating civilizations. Do you think that it’s clear/very likely that it is net helpful for there to be more mature suffering-hating civilizations? (On the suffering-focused perspective.)
My intuition is that there is no point in trying to answer questions like these before we know a lot more about decision theory, metaethics, metaphilosophy, and normative ethics, so pushing for a future where these kinds of questions eventually get answered correctly (and the answers make a difference in what happens) seems like the most important thing to do. It doesn’t seem to make sense to try to lock in some answers (i.e., make our civilization suffering-hating or not suffering-hating) on the off chance that when we figure out what the answers actually are, it will be too late. Someone with much less moral/philosophical uncertainty than I do would perhaps prioritize things differently, but I find it difficult to motivate myself to think really hard from their perspective.
This question seems like a major input into whether x-risk reduction is useful.
If we try to answer the question now, it seems very likely we’ll get the answer wrong (given my state of uncertainty about the inputs that go into the question). I want to keep civilization going until we know better how to answer these types of questions. For example if we succeed in building a correctly designed/implemented Singleton FAI, it ought to be able to consider this question at leisure, and if it becomes clear that the existence of mature suffering-hating civilizations actually causes more suffering to be created, then it can decide to not make us into a mature suffering-hating civilization, or take whatever other action is appropriate.
Are you worried that by the time such an FAI (or whatever will control our civilization) figures out the answer, it will be too late? (Why? If we can decide that x-risk reduction is bad, then so can it. If it’s too late to alter or end civilization at that point, why isn’t it already too late for us?) Or are you worried more that the question won’t be answered correctly by whatever will control our civilization?
If you are concerned exclusively with suffering, then increasing the number of mature civilizations is obviously bad and you’d prefer that the average civilization not exist. You might think that our descendants are particularly good to keep around, since we hate suffering so much. But in fact almost all s-risks occur precisely because of civilizations that hate suffering, so it’s not at all clear that creating “the civilization that we will become on reflection” is better than creating “a random civilization” (which is bad).
To be clear, even if we have modest amounts of moral uncertainty I think it could easily justify a “wait and see” style approach. But if we were committed to a suffering-focused view then I don’t think your argument works.
It seems just as plausible to me that suffering-hating civilizations reduce the overall amount of suffering in the multiverse, so I think I’d wait until it becomes clear which is the case, even if I was concerned exclusively with suffering. But I haven’t thought about this question much, since I haven’t had a reason to assume an exclusive concern with suffering, until you started asking me to.
Earlier in this thread I’d been speaking from the perspective of my own moral uncertainty, not from a purely suffering-focused view, since we were discussing the linked article, and Kaj had written:
What’s your reason for considering a purely suffering-focused view? Intellectual curiosity? Being nice to or cooperating with people like Brian Tomasik by helping to analyze one of their problems?
Understanding the recommendations of each plausible theory seems like a useful first step in decision-making under moral uncertainty.
Perhaps this, in case it turns out to be highly important but difficult to get certain ingredients – e.g. priors or decision theory – exactly right. (But I have no idea, it’s also plausible that suboptimal designs could patch themselves well, get rescued somehow, or just have their goals changed without much fuss.)
That sort of subject is inherently implicit in the kind of decision-theoretic questions that MIRI-style AI research involves. More generally, when one is thinking about astronomical-scale questions, and aggregating utilities, and so on, it is a matter of course that cosmically bad outcomes are as much of a theoretical possibility as cosmically good outcomes.
Now, the idea that one might need to specifically think about the bad outcomes, in the sense that preventing them might require strategies separate from those required for achieving good outcomes, may depend on additional assumptions that haven’t been conventional wisdom here.
Right, I took this idea to be one of the main contributions of the article, and assumed that this was one of the reasons why cousin_it felt it was important and novel.
Thanks for voicing this sentiment I had upon reading the original comment. My impression was that negative utilitarian viewpoints / things of this sort had been trending for far longer than cousin_it’s comment might suggest.
The article isn’t specifically negative utilitarian, though—even classical utilitarians would agree that having astronomical amounts of suffering is a bad thing. Nor do you have to be a utilitarian in the first place to think it would be bad: as the article itself notes, pretty much all major value systems probably agree on s-risks being a major Bad Thing:
Yes, but the claim that that risk needs to be taken seriously is certainly not conventional wisdom around here.
Decision theory (which includes the study of risks of that sort) has long been a core component of AI-alignment research.
No, it doesn’t. Decision theory deals with abstract utility functions. It can talk about outcomes A, B, and C where A is preferred to B and B is preferred to C, but doesn’t care whether A represents the status quo, B represents death, and C represents extreme suffering, or whether A represents gaining lots of wealth and status, B represents the status quo, and C represents death, so long as the ratios of utility differences are the same in each case. Decision theory has nothing to do with the study of s-risks.
The first and last sentences of the parent comment do not follow from the statements in between.
That doesn’t seem to refute or change what Alex said?
What Alex said doesn’t seem to refute or change what I said.
But also: I disagree with the parent. I take conventional wisdom here to include support for MIRI’s agent foundations agenda, which includes decision theory, which includes the study of such risks (even if only indirectly or implicitly).
Fair enough. I guess I didn’t think carefully about it before. I assumed that s-risks were much less likely than x-risks (true) and so they could be discounted (false). It seems like the right way to imagine the landscape of superintelligences is a vast flat plain (paperclippers and other things that kill everyone without fuss) with a tall thin peak (FAIs) surrounded by a pit that’s astronomically deeper (FAI-adjacent and other designs). The right comparison is between the peak and the pit, because if the pit is more likely, I’d rather have the plain.
I think the reason why cousin_it’s comment is upvoted so much is that a lot of people (including me) weren’t really aware of S-risks or how bad they could be. It’s one thing to just make a throwaway line that S-risks could be worse, but it’s another thing entirely to put together a convincing argument.
Similar ideas have been in other articles, but they’ve framed it in terms of energy-efficiency while defining weird words such as computronium or the two-envelopes problem, which make it much less clear. I don’t think I saw the links for either of those articles before, but if I had, I probably wouldn’t have read them.
I also think that the title helps as well. S-risks is a catchy name, especially if you already know x-risks. I know that this term has been used before, but it wasn’t used in the title. Further, while being quite a good article, you can read the summary, introduction and conclusion without encountering the idea that the author believes that s-risks are much greater than x-risks, as opposed to being just yet another risk to worry about.
I think there’s definitely an important lesson to be drawn here. I wonder how many other articles have gotten close to an important truth, but just failed to hit it out fo the park for some reason or another.
Interesting!
I’m only confident about endorsing this conclusion conditional on having values where reducing suffering matters a great deal more than promoting happiness. So we wrote the “Reducing risks of astronomical suffering” article in a deliberately ‘balanced’ way, pointing out the different perspectives. This is why it didn’t come away making any very strong claims. I don’t find the energy-efficiency point convincing at all, but for those who do, x-risks are likely (though not with very high confidence) still more important, mainly because more futures will be optimized for good outcomes rather than bad outcomes, and this is where most of the value is likely to come from. The “pit” around the FAI-peak is in expectation extremely bad compared to anything that exists currently, but most of it is just accidental suffering that is still comparatively unoptimized. So in the end, whether s-risks or x-risks are more important to work on on the margin depends on how suffering-focused or not someone’s values are.
Having said that, I totally agree that more people should be concerned about s-risks and it’s concerning that the article (and the one on suffering-focused AI safety) didn’t manage to convey this point well.
That sounds like a recipe for Pascal’s Mugging.
Only if you think one in a million events are as rare as meeting god in person.
The article that introduced the term “s-risk” was shared on LessWrong in October 2016. The content of the article and the talk seem similar.
Did you simply not come across it or did the article just (catastrophically) fail to explain the concept of s-risks and its relevance?
I’ve seen similar articles before, but somehow this was the first one that shook me. Thank you for doing this work!
And the concept is much older than that. The 2011 Felicifia post “A few dystopic future scenarios” by Brian Tomasik outlined many of the same considerations that FRI works on today (suffering simulations, etc.), and of course Brian has been blogging about risks of astronomical suffering since then. FRI itself was founded in 2013.
Iain Banks’ Surface Detail published in 2010 featured a war over the existence of virtual hells (simulations constructed explicitly to punish the ems of sinners).
The only counterarguments I can think of would be:
The claim that the likelihood of s-risks being close to that of x-risks seems not well argued to me. In particular, conflict seems to be the most plausible scenario (and one which has a high prior placed on it as we can observe that much suffering today is caused by conflict), but it seems to be less and less likely of a scenario once you factor in superintelligence, as multi-polar scenarios seem to be either very short-lived or unlikely to happen at all.
We should be wary of applying anthropomorphic traits to hypothetical artificial agents in the future. Pain in biological organisms may very well have evolved as a proxy to negative utility, and might not be necessary in “pure” agent intelligences which can calculate utility functions directly. It’s not obvious to me that implementing suffering in the sense that humans understand it would be cheaper or more efficient for a superintelligence to do instead of simply creating utility-maximizers when it needs to produce a large number of sub-agents.
High overlap between approaches to mitigating x-risk and approaches to mitigating s-risks. If the best chance of mitigating future suffering is trying to bring about a friendly artificial intelligence explosion, then it seems that the approaches we are currently taking should still be the correct ones.
More speculatively: If we focus heavily on s-risks, does this open us up to issues regarding utility-monsters? Can I extort people by creating a simulation of trillions of agents and then threaten to minimize their utility? (If we simply value the sum of utility, and not necessarily the complexity of the agent having the utility, then this should be relatively cheap to implement).
I think the most general response to your first three points would look something like this: Any superintelligence that achieves human values will be adjacent in design space to many superintelligences that cause massive suffering, so it’s quite likely that the wrong superintelligence will win, due to human error, malice, or arms races.
As to your last point, it looks more like a research problem than a counterargument, and I’d be very interested in any progress on that front :-)
Why so? Flipping the sign doesn’t get you “adjacent”, it gets you “diametrically opposed”.
If you really want chocolate ice cream, “adjacent” would be getting strawberry ice cream, not having ghost pepper extract poured into your mouth.
They said “adjacent in design space”. The Levenshtein distance between
return val;
andreturn -val;
is 1.So being served a cup of coffee and being served a cup of pure capsaicin are “adjacent in design space”? Maybe, but funny how that problem doesn’t arise or even worry anyone...
More like driving to the store and driving into the brick wall of the store are adjacent in design space.
That’s a twist on a standard LW argument, see e.g. here:
It seems to me that fragility of value can lead to massive suffering in many ways.
You’re basically dialing that argument up to eleven. From “losing a small part could lead to unacceptable results” you are jumping to “losing any small part will lead to unimaginable hellscapes”:
Yeah, not all parts. But even if it’s a 1% chance, one hellscape might balance out a hundred universes where FAI wins. Pain is just too effective at creating disutility. I understand why people want to be optimistic, but I think being pessimistic in this case is more responsible.
So basically you are saying that the situation is asymmetric: the impact/magnitude of possible bad things is much much greater than the impact/magnitude of possible good things. Is this correct?
Yeah. One sign of asymmetry is that creating two universes, one filled with pleasure and the other filled with pain, feels strongly negative rather than symmetric to us. Another sign is that pain is an internal experience, while our values might refer to the external world (though it’s very murky), so the former might be much easier to achieve. Another sign is that in our world it’s much easier to create a life filled with pain than a life that fulfills human values.
Yes, many people intuitively feel that a universe of pleasure and a universe of pain add to a net negative. But I suspect that’s just a result of experiencing (and avoiding) lots of sources of extreme pain in our lives, while sources of pleasure tend to be diffuse and relatively rare. The human experience of pleasure is conjunctive because in order to survive and reproduce you must fairly reliably avoid all types of extreme pain. But in a pleasure-maximizing environment, removing pain will be a given.
It’s also true that our brains tend to adapt to pleasure over time, but that seems simple to modify once physiological constraints are removed.
“one filled with pleasure and the other filled with pain, feels strongly negative rather than symmetric to us”
Comparing pains and pleasures of similar magnitude? People have a tendency not to do this, see the linked thread.
“Another sign is that pain is an internal experience, while our values might refer to the external world (though it’s very murky”
You accept pain and risk of pain all the time to pursue various pleasures, desires and goals. Mice will cross electrified surfaces for tastier treats.
If you’re going to care about hedonic states as such, why treat the external case differently?
Alternatively, if you’re going to dismiss pleasure as just an indicator of true goals (e.g. that pursuit of pleasure as such is ‘wireheading’) then why not dismiss pain in the same way, as just a signal and not itself a goal?
My point was comparing pains and pleasures that could be generated with similar amount of resources. Do you think they balance out for human decision making? For example, I’d strongly disagree to create a box of pleasure and a box of pain, do you think my preference would go away after extrapolation?
“My point was comparing pains and pleasures that could be generated with similar amount of resources. Do you think they balance out for human decision making?”
I think with current tech it’s cheaper and easier to wirehead to increase pain (i.e. torture) than to increase pleasure or reduce pain. This makes sense biologically, since organisms won’t go looking for ways to wirehead to maximize their own pain, evolution doesn’t need to ‘hide the keys’ as much as with pleasure or pain relief (where the organism would actively seek out easy means of subverting the behavioral functions of the hedonic system). Thus when powerful addictive drugs are available, such as alcohol, human populations evolve increased resistance over time. The sex systems evolve to make masturbation less rewarding than reproductive sex under ancestral conditions, desire for play/curiosity is limited by boredom, delicious foods become less pleasant when full or the foods are not later associated with nutritional sensors in the stomach, etc.
I don’t think this is true with fine control over the nervous system (or a digital version) to adjust felt intensity and behavioral reinforcement. I think with that sort of full access one could easily increase the intensity (and ease of activation) of pleasures/mood such that one would trade them off against the most intense pains at ~parity per second, and attempts at subjective comparison when or after experiencing both would put them at ~parity.
People will willingly undergo very painful jobs and undertakings for money, physical pleasures, love, status, childbirth, altruism, meaning, etc. Unless you have a different standard for the ‘boxes’ than used in subjective comparison with rich experience of the things to be compared I think we just haggling over the price re intensity.
We know the felt caliber and behavioral influence of such things can vary greatly. It would be possible to alter nociception and pain receptors to amp up or damp down any particular pain. This could even involve adding a new sense, e.g. someone with congenital deafness could be given the ability to hear (installing new nerves and neurons), and hear painful sounds, with artificially set intensity of pain. Likewise one could add a new sense (or dial one up) to enable stronger pleasures. I think that both the new pains and new pleasures would ‘count’ to the same degree (and if you’re going to dismiss the pleasures as ‘wireheading’ then you should dismiss the pains too).
″ For example, I’d strongly disagree to create a box of pleasure and a box of pain, do you think my preference would go away after extrapolation?”
You trade off pain and pleasure in your own life, are you saying that the standard would be different for the boxes than for yourself?
What are you using as the examples to represent the boxes, and have you experienced them? (As discussed in my link above, people often use weaksauce examples in such comparison.)
We could certainly make agents for whom pleasure and pain would use equal resources per util. The question is if human preferences today (or extrapolated) would sympathize with such agents to the point of giving them the universe. Their decision-making could look very inhuman to us. If we value such agents with a discount factor, we’re back at square one.
That’s what the congenital deafness discussion was about.
You have preferences over pain and pleasure intensities that you haven’t experienced, or new durations of experiences you know. Otherwise you wouldn’t have anything to worry about re torture, since you haven’t experienced it.
Consider people with pain asymbolia:
Suppose you currently had pain asymbolia. Would that mean you wouldn’t object to pain and suffering in non-asymbolics? What if you personally had only happened to experience extremely mild discomfort while having lots of great positive experiences? What about for yourself? If you knew you were going to get a cure for your pain asymbolia tomorrow would you object to subsequent torture as intrinsically bad?
We can go through similar stories for major depression and positive mood.
Seems it’s the character of the experience that matters.
Likewise, if you’ve never experienced skiing, chocolate, favorite films, sex, victory in sports, and similar things that doesn’t mean you should act as though they have no moral value. This also holds true for enhanced experiences and experiences your brain currently is unable to have, like the case of congenital deafness followed by a procedure to grant hearing and listening to music.
Music and chocolate are known to be mostly safe. I guess I’m more cautious about new self-modifications that can change my decisions massively, including decisions about more self-modifications. It seems like if I’m not careful, you can devise a sequence that will turn me into a paperclipper. That’s why I discount such agents for now, until I understand better what CEV means.
This seems plausible but not obvious to me. Humans are superintelligent as compared to chimpanzees (let alone, say, Venus flytraps), but humans have still formed a multipolar civilization.
When thinking about whether s-risk scenarios are tied to or come about by similar means as x-risk scenarios (such as a malign intelligence explosion), the relevant issue to me seems to be whether or not such a scenario could result in a multi-polar conflict of cosmic proportions. I think the chance of that happening is quite low, since intelligence explosions seem to be most likely to result in a singleton.
Due to complexity and fragility of human values, any superintelligence that fulfills them will probably be adjacent in design space to many other superintelligences that cause lots of suffering (which is also much cheaper), so a wrong superintelligence might take over due to human error or malice or arms races. That’s where most s-risk is coming from, I think. The one in a million number seems optimistic, actually.
I agree that preventing s-risks is important, but I will try to look on possible counter arguments:
Benevolent AI will able to fight acasual war against evil AI in the another branch of the multiverse by creating more my happy copies, or more paths from suffering observer-moment to happy observer-moment. So creating benevolent superintelligence will help against suffering everywhere in the multiverse.
Non-existence is the worst form of suffering if we define suffering as action against our most important value. Thus x-risks are s-risks. Pain is not always suffering, as masochists exist.
If we value too much attention to animal suffering, we give ground to projects like Voluntary human extinction movement. So we increase chances of human extinction, as humans created animal farms. Moreover, if we agree that non-existence is not suffering, we could kill all life on earth and stop all sufferings—which is not right.
Benevolent AI will able to resurrect all possible sentient beings and animals and provide them infinite paradise thus compensating any current suffering of animals.
Only infinite and unbearable suffering are bad. We should distinguish unbearable sufferings like agony, and ordinary sufferings which just reinforcement learning signals for wetware of our brain and inform us about the past wrong decisions or the need to call a doctor.
I think all of these are quite unconvincing and the argument stays intact, but thanks for coming up with them.
I think longer explanation is needed to show how benevolent AI will save observers from evil AI. It is not just compensation for sufferings. It is based on the idea of the indexical uncertainty of equal observers. If two equal observers-moments exist, he doesn’t know, which one them he is. So a benevolent AI creates 1000 copies of an observer-moment which is in jail of evil AI, and construct to each copy pleasant next moment. From the point of view of the jailed observer-moment, there will be 1001 expected future moments for him, and only 1 of them will consist of continued sufferings. So expected duration of his suffering will be less than a second. However, to win such game benevolent AI need to have the overwhelming advantage in computer power and some other assumptions about nature of personal identity need to be resolved.
I agree that some outcomes, like eternal very strong suffering are worse, but it is important to think about non-existence as a form of suffering, as it will help us in utilitarian calculations and will help to show that x-risks are the type of s-risks.
There more people in the world who care about animal sufferings than about x-risks, and giving them new argument increases the probability of x-risks.
What do you mean by “Also it’s about animals for some reason, let’s talk about them when hell freezes over.”? We could provide happiness to all animals and provide infinitely survival to their species, which otherwise will extinct completely in millions years.
Do you mean finite, but unbearable sufferings, like intensive pain for one year?
EDITED: It looks like you changed your long reply while I was writing the long answer on all your counterarguments.
X-risk is still plausibly worse in that we need to survive to reach as much of the universe as possible and eliminate suffering in other places.
Edit: Brian talks about this here: https://foundational-research.org/risks-of-astronomical-future-suffering/#Spread_of_wild_animals-2
Interesting to see another future philosophy.
I think my own rough future philosophy is making sure that the future has an increase in autonomy for humanity. I think it transforms into S-risk reduction assuming that autonomous people will chose to reduce their suffering and their potential future suffering if they can. It also transforms the tricky philosophical question of defining suffering into the tricky philosophical question of defining autonomy, that might be trade that is preferred.
I think I prefer the autonomy increase because I do not have to try and predict the emotional reaction of humans/agents to events. People could claim immense suffering to seeing me wearing bright/clashing clothing. But if I leave them and their physical environment alone I’m not decreasing their autonomy. It also suggests positive things to do (Give directly) rather than just avoidance of low autonomy outcomes.
There is however tension between the individual increase in autonomy and the increase in autonomy of the society.
As usual, xkcd is relevant.
So I don’t have much experience with philosophy; this is mainly a collection of my thoughts as I read through.
1) S-risks seem to basically describe hellscapes, situations of unimaginable suffering. Is that about right?
2) Two assumptions here seem to be valuing future sentience and the additive nature of utility/suffering. Are these typical stances to be taking? Should there be some sort of discounting happening here?
3) I’m pretty sure I’m strawmanning here, but I can’t but feel like there’s some sort of argument by definition here where we first defined s-risks as the worst things possible, then concluded that we should work on them because EAs might want to avert the worst things possible. It seems...a little vacuous?
4) In UNSONG, someone mentioned that Thamiel is basically an anti-Friendly AI in that he’s roughly the inverse of our human values. That is, actual Unfriendliness (i.e. designed to maximize suffering) seem to be subtly encoding a lot of dense information about human suffering much the same way that Friendliness does. So I guess I’m trying to say that causing s-risks to happen actually seems to be a pretty hard problem, at least one that requires far more nuanced models than merely extinction.
In the case of AI going wrong, I currently find it far more plausible that extinction happens, rather than a hellscape scenario. It seems to me that we’d need to get like 90% of the way to alignment and then take a sharp turn for s-risks to happen, and given that we haven’t really made much substantial progress in alignment, I guess I’m unconvinced?
5) Oh, wait, looks like you covered point 4 about 3/4ths of the way down the page.
6) Additional arguments for s-risks seem to be based upon suffering of other potential sapient beings we create. I haven’t read Tomasik’s stuff, so I can’t say that much here, except that it seems to me that sapience might not equal capacity for suffering?
7) Your conclusion seems a little strong. I agree that conflicts can cause localized suffering (e.g. torturing people during wartime), but the arguments seem to rest quite a bit on proposed future sentient beings, which, I dunno, don’t seem as imminent? (For context, I’m worried about x-risks because projections in the next 100 years spread across things like climate change and more unpredictable things like AI paint a fairly bleak picture.)
8) I’m just noting that, should any x-risk come to pass, this solves the s-risk problem for humans / things related to humans. But there could just as well be sapient aliens suffering elsewhere, I guess.
To maximize human suffering per unit of space-time, you need a good model of human values, just like a Friendly AI.
But to create astronomical amount of human suffering (without really maximizing it), you only need to fill astronomical amount of space-time with humans living in bad conditions, and prevent them from escaping those conditions. Relatively easier.
Instead of Thamiel, imagine immortal Pol Pot with space travel.
Ah, okay. Thanks for the clarification here.
We could also add a-risks: that human civilisation will destroy alien life and alien civilizations. For example, LHC-false vacuum-catastrophe or UFAI could dangerously affect all visible universe and kill an unknown number of the alien civilisations or prevent their existence.
Preventing risks to alien life is one of the main efforts in the sterilisation of Mars rovers and sinking of Galileo and Cassini in Jupiter and Saturn after the end of their missions.
The flip side of this idea is “cosmic rescue missions” (term coined by David Pearce), which refers to the hypothetical scenario in which human civilization help to reduce the suffering of sentient extraterrestrials (in the original context, it referred to the use of technology to abolish suffering). Of course, this is more relevant for simple animal-like aliens and less so for advanced civilizations, which would presumably have already either implemented a similar technology or decided to reject such technology. Brian Tomasik argues that cosmic rescue missions are unlikely.
Also, there’s an argument that humanity conquering aliens civs would only be considered bad if you assume that either (1) we have non-universalist-consequentialist reasons to believe that preventing alien civilizations from existing is bad, or (2) the alien civilization would produce greater universalist-consequentialist value than human civilizations with the same resources. If (2) is the case, then humanity should actually be willing to sacrifice itself to let the aliens take over (like in the “utility monster” thought experiment), assuming that universalist consequentialism is true. If neither (1) nor (2) holds, then human civilization would have greater value than ET civilization. Seth Baum’s paper on universalist ethics and alien encounters goes into greater detail.
Thanks for links. My thought was that we may give higher negative utility to those x-risks which are able to become a-risks too, that is LHC and AI.
If you know Russian science fiction by Strugatsky, there is an idea in it of “Progressors”—the people who are implanted into other civilisations to help them develop quickly. At the end, the main character concluded that such actions violate value of any civilization to determine their own way and he returned to earth to search and stop possible alien progressors on here.
Oh, in those cases, the considerations I mentioned don’t apply. But I still thought they were worth mentioning.
In Star Trek, the Federation has a “Prime Directive” against interfering with the development of alien civilizations.
The main role of which is to figure in this recurring dialogue:
-- Captain, but the Prime Directive!
-- Screw it, we’re going in.
Iain Banks has similar themes in his books—e.g. Inversions. And generally speaking, in the Culture universe, the Special Circumstances are a meddlesome bunch.
Want to improve the wiki page on s-risk? I started it a few months ago but it could use some work.
Feedback: I had to scroll a very long way until I found out what “s-risk” even was. By then I had lost interest, mainly because generalizing from fiction is not useful.
You might like this better:
https://foundational-research.org/reducing-risks-of-astronomical-suffering-a-neglected-priority/
Thank you for your feedback. I’ve added a paragraph at the top of the post that includes the definition of s-risk and refers readers already familiar with the concept to another article.
Thanks for the feedback! The first sentence below the title slide says: “I’ll talk about risks of severe suffering in the far future, or s-risks.” Was this an insufficient definition for you? Would you recommend a different definition?
Direct quote: “So, s-risks are roughly as severe as factory farming”
/facepalm
Is it still a facepalm given the rest of the sentence? “So, s-risks are roughly as severe as factory farming, but with an even larger scope.” The word “severe” is being used in a technical sense (discussed a few paragraphs earlier) to mean something like “per individual badness” without considering scope.
I think the claim that s-risks are roughly as severe as factory farming “per individual badness” is unsubstantiated. But it is reasonable to claim that experiencing either would be worse than death, “hellish”. Remember, Hell has circles.
The section presumes that the audience agrees wrt veganism. To an audience who isn’t on board with EA veganism, that line comes across as the “arson, murder, and jaywalking” trope.
A lot of people who disagree with veganism agree that factory farming is terrible. Like, more than 50% of the population I’d say.
Notably, the great majority of them don’t have the slightest clue about farming in general or factory farming in particular. Don’t mistake social signaling for actual positions.
As the expression about knowing “how the sausage is made” attests, generally the more people learn about it, the less they like it.
Of course, veganism is very far from being an immediate consequence of disliking factory farming. (Similarly, refusing to pay taxes is very far from being an immediate consequence of disliking government policy.)
That’s not obvious to me.
I agree that the more people are exposed to anti-factory-farming propaganda, the more they are influenced by it, but that’s not quite the same thing, is it?
Facepalm was a severe understatement, this quote is a direct ticket to the loony bin. I recommend poking your head out of the bubble once in a while—it’s a whole world out there. For example, some horrible terrible no-good people—like me—consider factory farming to be an efficient way of producing a lot of food at reasonable cost.
This sentence reads approximately as “Literal genocide (e.g. Rwanda) is roughly as severe as using a masculine pronoun with respect to a nonspecific person, but with an even larger scope”.
The steeliest steelman that I can come up with is that you’re utterly out of touch with the Normies.
I sympathize with your feeling of alienation at the comment, and thanks for offering this perspective that seems outlandish to me. I don’t think I agree with you re who the ‘normies’ are, but I suspect that this may not be a fruitful thing to even argue about.
Side note: I’m reminded of the discussion here. (It seems tricky to find a good way to point out that other people are presenting their normative views in a way that signals an unfair consensus, without getting into/accused of identify politics or having to throw around words like “loony bin” or fighting over who the ‘normies’ are.)
Yes, we clearly have very different worldviews. I don’t think alienation is the right word here, it’s just that different people think about the world differently and IMHO that’s perfectly fine (to clarify, I mean values and normative statements, not facts). And, of course, you have no obligation at all to do something about it.
Yeah, that part is phrased poorly :-/
If it makes sense to continue adding letters to different risks, l-risks could be identified, that is the risks that kill all life on earth. The main difference for us, humans, that there are zero chances of the new civilisation of Earth in that case.
But y-risks term is free. What could it be?
The risk that we think about risks too much and never do anything interesting?
Or may be risks of accidental p-zombing.… but it is z-risk.
The S is for “Skitter”