building misaligned smarter-than-human systems will kill everyone, including their children [...] if they come to understand this central truth.
I’d like to once again reiterate that the arguments for misaligned AIs killing literally all humans (if they succeed in takeover) are quite weak and probably literally all humans dying conditional on AI takeover is unlikely (<50% likely).
(To be clear, I think there is a substantial chance of at least 1 billion people dying and that AI takeover is very bad from a longtermist perspective.)
This is due to:
The potential for the AI to be at least a tiny bit “kind” (same as humans probably wouldn’t kill all aliens). [1]
Decision theory/trade reasons
This is discussed in more detail here and here. (There is also some discussion here.)
(This content is copied from here and there is some discussion there.)
Further, as far as I can tell, central thought leaders of MIRI (Eliezer, Nate Soares) don’t actually believe that misaligned AI takeover will lead to the deaths of literally all humans:
I sometimes mention the possibility of being stored and sold to aliens a billion years later, which seems to me to validly incorporate most all the hopes and fears and uncertainties that should properly be involved, without getting into any weirdness that I don’t expect Earthlings to think about validly.
I’m somewhat persuaded by the claim that failing to mention even the possibility of having your brainstate stored, and then run-and-warped by an AI or aliens or whatever later, or run in an alien zoo later, is potentially misleading.
Insofar as MIRI is planning on focusing on straightforwardly saying what they think as a comms strategy, it seems important to resolve this issue.
I’d be personally be happy with either:
Change the message to something like “building misaligned smarter-than-human systems will kill high fractions of humanity, including their children” (insofar as MIRI believes this)
More seriously defend the claims that AIs will only retain a subset brain scans for aliens (rather than keeping humans alive and happy or quickly reviving all humans into a good-for-each-human situation) and where reasonable include “AIs might retain brain scans that are later revived” (E.g. no need to include this in central messaging, I think it is reasonable (from an onion honesty perspective) to describe AIs as “killing all of us” if they physically kill all humans and then brain scan only a subset and sell these brain scans.)
(Edit: I clarify some of where I’m coming from here.)
The more complex messges sounds like a great way to make the public communication more complex and offputting.
The difference between killing everyone and killing almost everyone while keeping a few alive for arcane purposes does not matter to most people, nor should it.
I agree that the arguments for misaligned AGI killing absolutely everyone aren’t solid, but the arguments against that seem at least as shaky. So rounding it to “might quite possibly kill everyone” seems fair and succinct.
From the other thread where this comment originated: the argument that AGI won’t kill everyone because people wouldn’t kill everyone seems very bad, even when applied to human-imitating LLM-based AGI. People are nice because evolution meticulously made us nice. And even humans have killed an awful lot of people, with no sign they’d stop before killing everyone if it seemed useful for their goals.
That phrase sounds like the Terminator movies to me; it sounds like plucky humans could still band together to overthrow their robot overlords. I want to convey a total loss of control.
In documents where we have more room to unpack concepts I can imagine getting into some of the more exotic scenarios like aliens buying brain scans, but mostly I don’t expect our audiences to find that scenario reassuring in any way, and going into any detail about it doesn’t feel like a useful way to spend weirdness points.
Some of the other things you suggest, like future systems keeping humans physically alive, do not seem plausible to me. Whatever they’re trying to do, there’s almost certainly a better way to do it than by keeping Matrix-like human body farms running.
going into any detail about it doesn’t feel like a useful way to spend weirdness points.
That may be a reasonable consequentialist decision given your goals, but it’s in tension with your claim in the post to be disregarding the advice of people telling you to “hoard status and credibility points, and [not] spend any on being weird.”
Whatever they’re trying to do, there’s almost certainly a better way to do it than by keeping Matrix-like human body farms running.
You’ve completely ignored the arguments from Paul Christiano that Ryan linked to at the top of the thread. (In case you missed it: 12.)
The claim under consideration is not that “keeping Matrix-like human body farms running” arises as an instrumental subgoal of “[w]hatever [AIs are] trying to do.” (If you didn’t have time to read the linked arguments, you could have just said that instead of inventing an obvious strawman.)
Rather, the claim is that it’s plausible that the AI we build (or some agency that has decision-theoretic bargaining power with it) cares about humans enough to spend some tiny fraction of the cosmic endowment on our welfare. (Compare to how humans care enough about nature preservation and animal welfare to spend some resources on it, even though it’s a tiny fraction of what our civilization is doing.)
Maybe you think that’s implausible, but if so, there should be a counterargument explaining why Christiano is wrong. As Ryan notes, Yudkowsky seems to believe that some scenarios in which an agency with bargaining power cares about humans are plausible, describing one example of such as “validly incorporat[ing] most all the hopes and fears and uncertainties that should properly be involved, without getting into any weirdness that I don’t expect Earthlings to think about validly.” I regard this statement as undermining your claim in the post that MIRI’s “reputation as straight shooters [...] remains intact.” Withholding information because you don’t trust your audience to reason validly (!!) is not at all the behavior of a “straight shooter”.
I think it makes sense to state the more direct threat-model of literal extinction; though I am also a little confused by the citing of weirdness points… I would’ve said that it makes the whole conversation more complex in a way that (I believe) everyone would reliably end up thinking was not a productive use of time.
(Expanding on this a little: I think that literal extinction is a likely default outcome, and most people who are newly coming to this topic would want to know that this is even in the hypothesis-space and find that to be key information. I think if I said “also maybe they later simulate us in weird configurations like pets for a day every billion years while experiencing insane things” they would not respond “ah, never mind then, this subject is no longer a very big issue”, they would be more like “I would’ve preferred that you had factored this element out of our discussion so far, we spent a lot of time on it yet it still seems to me like the extinction event being on the table is the primary thing that I want to debate”.)
Withholding information because you don’t trust your audience to reason validly (!!) is not at all the behavior of a “straight shooter”.
Hmm, I’m not sure I exactly buy this. I think you should probably follow something like onion honesty which can involve intentionally simplifying your message to something you expect will give the audience more true views. I think you should lean on the side of stating things, but still, sometimes stating a thing which is true can be clearly distracting and confusing and thus you shouldn’t.
Some of the other things you suggest, like future systems keeping humans physically alive, do not seem plausible to me.
I agree with Gretta here, and I think this is a crux. If MIRI folks thought it were likely that AI will leave a few humans biologically alive (as opposed to information-theoretically revivable), I don’t think we’d be comfortable saying “AI is going to kill everyone”. (I encourage other MIRI folks to chime in if they disagree with me about the counterfactual.)
I also personally have maybe half my probability mass on “the AI just doesn’t store any human brain-states long-term”, and I have less than 1% probability on “conditional on the AI storing human brain-states for future trade, the AI does in fact encounter aliens that want to trade and this trade results in a flourishing human civilization”.
That phrase sounds like the Terminator movies to me; it sounds like plucky humans could still band together to overthrow their robot overlords. I want to convey a total loss of control.
Yeah, seems like a reasonable concern.
FWIW, I also do think that it is reasonably likely that we’ll see conflict between human factions and AI factions (likely with humans allies) in which the human factions could very plausibly win. So, personally, I don’t think that “immediate total loss of control” is what people should typically be imagining.
Some of the other things you suggest, like future systems keeping humans physically alive, do not seem plausible to me. Whatever they’re trying to do, there’s almost certainly a better way to do it than by keeping Matrix-like human body farms running.
Insofar as AIs are doing things because they are what existing humans want (within some tiny cost budget), then I expect that you should imagine that what actually happens is what humans want (rather than e.g. what the AI thinks they “should want”) insofar as what humans want is cheap.
See also here which makes a similar argument in response to a similar point.
So, if humans don’t end up physically alive but do end up as uploads/body farms/etc one of a few things must be true:
Humans didn’t actually want to be physically alive and instead wanted to be uploads. In this case, it is very misleading to say “the AI will kill everyone (and sure there might be uploads, but you don’t want to be an upload right?)” because we’re conditioning on people deciding to become uploads!
It was too expensive to keep people physically alive rather than uploads. I think this is possible but somewhat implausible: the main reasons for cost here apply to uploads as much as to keeping humans physically alive. In particular, death due to conflict or mass slaughter in cases where conflict was the AI’s best option to increase the probability of long run control.
I don’t think slaughtering billions of people would be very useful. As a reference point, wars between countries almost never result in slaughtering that large a fraction of people
I would like to +1 the “I don’t expect our audiences to find that scenario reassuring in any way”—I would also add that the average policymaker I’ve ever met wouldn’t find a lack of including the exotic scenarios to be in any way inaccurate or deceitful, unless you were way in the weeds for a multi-hour convo and-or they asked you in detail for “well, are there any weird edge cases where we make it through”.
The difference between killing everyone and killing almost everyone while keeping a few alive for arcane purposes does not matter to most people, nor should it.
I basically agree with this as stated, but think these arguments also imply that it is reasonably likely that the vast majority of people will survive misaligned AI takeover (perhaps 50% likely).
I also don’t think this is very well described as arcane purposes:
Kindness is pretty normal.
Decision theory motivations is actually also pretty normal from some perspective: it’s just the generalization of relatively normal “if you wouldn’t have screwed me over and it’s cheap for me, I won’t screw you over”. (Of course, people typically don’t motivate this sort of thing in terms of decision theory so there is a bit of a midwit meme here.)
You’re right. I didn’t mean to say that kindness is arcane. I was referring to acausal trade or other strange reasons to keep some humans around for possible future use.
Kindness is normal in our world, but I wouldn’t assume it will exist in every or even most situations with intelligent beings. Humans are instinctively kind (except for sociopathic and sadistic people), because that is good game theory for our situation: interactions with peers, in which collaboration/teamwork is useful.
A being capable of real recursive self-improvement, let alone duplication and creation of subordinate minds is not in that situation. They may temporarily be dealing with peers, but they might reasonably expect to have no need of collaborators in the near future. Thus, kindness isn’t rational for that type of being.
The exception would be if they could make a firm commitment to kindness while they do have peers and need collaborators. They might have kindness merely as an instrumental goal, in which case it would be abandoned as soon as it was no longer useful.
Or they might display kindness more instinctively, as a tendency in their thought or behavior. They might even have it engineered as an innate goal, as Steve hopes to engineer. In those last two cases, I think it’s possible that reflexive stability would keep that kindness in place as the AGI continued to grow, but I wouldn’t bet on it unless kindness was their central goal. If it was merely a tendency and not an explicit and therefore self-endorsed goal, I’d expect it to be dropped like the bad habit it effectively is. If it was an innate goal but not the strongest one, I don’t know but wouldn’t bet on it being long-term reflexively stable under deliberate self-modification.
(As far as I know, nobody has tried hard to work through the logic of reflexive stability of multiple goals. I tried, and gave it up as too vague and less urgent than other alignment questions. My tentative answer was maybe multiple goals would be reflectively stable; it depends on the exact structure of the decision-making process in that AGI/mind).
When you make a claim like “misaligned AIs kill literally everyone”, then reasonable people will be like “but will they?” and you should be a in a position where you can defend this claim. But actually, MIRI doesn’t really want to defend this claim against the best objections (or at least they haven’t seriously done so yet AFAICT).
Further, the more MIRI does this sort of move, the more that reasonable potential allies will have to distance themselves.
When you make a claim like “misaligned AIs kill literally everyone”, then reasonable people will be like “but will they?” and you should be a in a position where you can defend this claim.
I think most reasonable people will round off “some humans may be kept as brain scans that may have arbitrary cruelties done to them” to be equivalent to “everyone will be killed (or worse)” and not care about this particular point, seeing it as nitpicking that would not make the scenario any less horrible even if it was true.
I disagree. I think it matters a good amount. Like if the risk scenario is indeed “humans will probably get a solar system or two because it’s cheap from the perspective of the AI”. I also think there is a risk of AI torturing the uploads it has, and I agree that if that is the reason why humans are still alive then I would feel comfortable bracketing it, but I think Ryan is arguing more that something like “humans will get a solar system or two and basically get to have decent lives”.
Ryan is arguing more that something like “humans will get a solar system or two and basically get to have decent lives”.
Yep, this is an accurate description, but it is worth emphasizing that I think that horrible violent conflict and other bad outcomes for currently alive humans are reasonably likely.
I am not that confident about this. Or like, I don’t know, I do notice my psychological relationship to “all the stars explode” and “earth explodes” is very different, and I am not good enough at morality to be confident about dismissing that difference.
There’s definitely some difference, but I still think that the mathematical argument is just pretty strong, and losing a multiple of 1023 of your resources for hosting life and fun and goodness seems to me extremely close to “losing everything”.
@habryka I think you’re making a claim about whether or not the difference matters (IMO it does) but I perceived @Kaj_Sotala to be making a claim about whether “an average reasonably smart person out in society” would see the difference as meaningful (IMO they would not).
(My guess is you interpreted “reasonable people” to mean like “people who are really into reasoning about the world and trying to figure out the truth” and Kaj interpreted reasonable people to mean like “an average person.” Kaj should feel free to correct me if I’m wrong.)
The details matter here! Sometimes when (MIRI?) people say “unaligned AIs might be a bit nice and may not literally kill everyone” the modal story in their heads is something like some brain states of humans are saved in a hard drive somewhere for trade with more competent aliens. And sometimes when other people [1]say “unaligned humans might be a bit nice and may not literally kill everyone” the modal story in their heads is that some X% of humanity may or may not die in a violent coup, but the remaining humans get to live their normal lives on Earth (or even a solar system or two), with some AI survelliance but our subjective quality of life might not even be much worse (and might actually be better).
From a longtermist perspective, or a “dignity of human civilization” perspective, maybe the stories are pretty similar. But I expect “the average person” to be much more alarmed by the first story than the second, and not necessarily for bad reasons.
I don’t want to speak for Ryan or Paul, but at least tentatively this is my position: I basically think the difference from a resource management perspective of whether to keep humans around physically vs copies of them saved is ~0 when you have the cosmic endowment to play with, so small idiosyncratic preferences that’s significant enough to want to save human brain states should also be enough to be okay with keeping humans physically around; especially if humans strongly express a preference for the latter happening (which I think they do).
Note that “everyone will be killed (or worse)” is a different claim from “everyone will be killed”! (And see Oliver’s point that Ryan isn’t talking about mistreated brain scans.)
Further, as far as I can tell, central thought leaders of MIRI (Eliezer, Nate Soares) don’t actually believe that misaligned AI takeover will lead to the deaths of literally all humans:
This is confusing to me; those quotes are compatible with Eliezer and Nate believing that it’s very likely that misaligned AI takeover leads to the deaths of literally all humans.
Perhaps you’re making some point about how if they think it’s at all plausible that it doesn’t lead to everyone dying, they shouldn’t say “building misaligned smarter-than-human systems will kill everyone”. But that doesn’t seem quite right to me: if someone believed event X will happen with 99.99% probability and they wanted to be succinct, I don’t think it’s very unreasonable to say “X will happen” instead of “X is very likely to happen” (as long as when it comes up at all, they’re honest with their estimates).
I agree these quotes are compatible with them thinking that the deaths of literally all humans are likely conditional on misaligned AI takeover.
I also agree that if they think that it is >75% likely that AI will kill literally everyone, then it seems like a reasonable and honest to say “misaligned AI takeover will kill literally everyone”.
I also think it seems fine to describe the situation as “killing literally everyone” even if the AI preserve a subset of humans as brain scans and sell those scans to aliens. (Though probably this should be caveated in various places.
But, I think that they don’t actually put >75% probability on AI killing literally everyone and these quotes are some (though not sufficient) evidence for this. Or more minimally, they don’t seem to have made a case for the AI killing literally everyone which addresses the decision theory counterargument effectively. (I do think Soares and Eliezer have argued for AIs not caring at all aside from decision theory grounds, though I’m also skeptical about this.)
they don’t seem to have made a case for the AI killing literally everyone which addresses the decision theory counterargument effectively.
I think that’s the crux here. I don’t think the decision theory counterargument alone would move me from 99% to 75% - there are quite a few other reasons my probability is lower than that, but not purely on the merits of the argument in focus here. I would be surprised if that weren’t the case for many others as well, and very surprised if they didn’t put >75% probably on AI killing literally everyone.
I guess my position comes down to: There are many places where I and presumably you disagree with Nate and Eliezer’s view and think their credences are quite different from ours, and I’m confused by the framing of this particular one as something like “this seems like a piece missing from your comms strategy”. Unless you have better reasons than I for thinking they don’t put >75% probability on this—which is definitely plausible and may have happened in IRL conversations I wasn’t a part of, in which case I’m wrong.
I’m confused by the framing of this particular one as something like “this seems like a piece missing from your comms strategy”. Unless you have better reasons than I for thinking they don’t put >75% probability on this—which is definitely plausible and may have happened in IRL conversations I wasn’t a part of, in which case I’m wrong.
Based partially on my in person interactions with Nate and partially on some amalgamated sense from Nate and Eliezer’s comments on the topic, I don’t think they seem very commited to the view “the AI will kill literally everyone”.
Beyond this, I think Nate’s posts on the topic (here, here, and here) don’t seriously engage with the core arguments (listed in my comment) while simultaneously making a bunch of unimportant arguments that totally bury the lede.[1] See also my review of one of these posts here and Paul’s comment here making basically the same point.
I think it seems unfortunate to:
Make X part of your core comms messaging. (Because X is very linguistically nice.)
Make a bunch of posts hypothetically argueing for conclusion X while not really engaging with the best counterarguments and while making a bunch of points that bury the lede.
When these counterarguments are raised, note that you haven’t really thought much about the topic and that this isn’t much of a crux for you because a high fraction of your motivation is longtermist (see here).
Relevant quote from Nate:
I am not trying to argue with high confidence that humanity doesn’t get a small future on a spare asteroid-turned-computer or an alien zoo or maybe even star if we’re lucky, and acknowledge again that I haven’t much tried to think about the specifics of whether the spare asteroid or the alien zoo or distant simulations or oblivion is more likely, because it doesn’t much matter relative to the issue of securing the cosmic endowment in the name of Fun.
To be clear, I think AIs might kill huge numbers of people. Also, whether misaligned AI takeover kills everyone with >90% probability or kills billions with 50% probability doesn’t effect the bottom line for stopping takeover much from most people’s perspective! I just think it would be good to fix the messaging here to something more solid.
(I have a variety of reasons for thinking this sort of falsehood is problematic which I could get into as needed.)
Edit: note that some of these posts make correct points about unrelated and important questions (e.g. making IMO correct arguments that you very likely can’t bamboozle a high fraction of resources out of an AI using decision theory), I’m just claiming that with respect to the question of “will the AI kill all humans” these posts fail to engage with the strongest arguments and bury the lede.
For myself, I would not feel comfortable using language as confident-sounding as “on the default trajectory, AI is going to kill everyone” if I assigned (e.g.) 10% probability to “humanity [gets] a small future on a spare asteroid-turned-computer or an alien zoo or maybe even star”. I just think that scenario’s way, way less likely than that.
I’d be surprised if Nate assigns 10+% probability to scenarios like that, but he can speak for himself. 🤷♂️
I think some people at MIRI have significantly lower p(doom)? And I don’t expect those people to use language like “on the default trajectory, AI is going to kill everyone”.
I agree with you that there’s something weird about making lots of human-extinction-focused arguments when the thing we care more about is “does the cosmic endowment get turned into paperclips”? I do care about both of those things, an enormous amount; and I plan to talk about both of those things to some degree in public communications, rather than treating it as some kind of poorly-kept secret that MIRI folks care about whether flourishing interstellar civilizations get a chance to exist down the line. But I have this whole topic mentally flagged as a thing to be thoughtful and careful about, because it at least seems like an area that contains risk factors for future deceptive comms. E.g., if we update later to expecting the cosmic endowment to be wasted but all humans not dying, I would want us to adjust our messaging even if that means sacrificing some punchiness in our policy outreach.
Currently, however, I think the particular scenario “AI keeps a few flourishing humans around forever” is incredibly unlikely, and I don’t think Eliezer, Nate, etc. would say things like “this has a double-digit probability of happening in real life”? And, to be honest, the idea of myself and my family and friends and every other human being all dying in the near future really fucks me up and does not seem in any sense OK, even if (with my philosopher-hat on) I think this isn’t as big of a deal as “the cosmic endowment gets wasted”.
So I don’t currently feel bad about emphasizing a true prediction (“extremely likely that literally all humans literally nonconsensually die by violent means”), even though the philosophy-hat version of me thinks that the separate true prediction “extremely likely 99+% of the potential value of the long-term future is lost” is more morally important than that. Though I do feel obliged to semi-regularly mention the whole “cosmic endowment” thing in my public communication too, even if it doesn’t make it into various versions of my general-audience 60-second AI risk elevator pitch.
My remaining uncertainty is why you think AIs are so unlikely to keep humans around and treat them reasonably well (e.g. let them live out full lives).
From my perspective the argument that it is plausible that humans are treated well [even if misaligned AIs end up taking over the world and gaining absolute power] goes something like this:
If it only cost >1/million of overall resources to keep a reasonable fraction of humans alive and happy, it’s reasonably likely that misaligned AIs with full control would keep humans alive and happy due to either:
Acausal trade/decision theory
The AI terminally caring at least a bit about being nice to humans (perhaps because it cares a bit about respecting existing nearby agents or perhaps because it has at least a bit of human like values).
It is pretty likely that it costs <1/million of overall resources (from the AI’s perspective) to keep a reaonable fraction of humans alive and happy. Humans are extremely keep to keep around asymptotically and I think it can be pretty cheap even initially, especially if you’re a very smart AI.
(See links in my prior comment for more discussion.)
(I also think the argument goes through for 1/billion, but I thought I would focus on the higher value for now.)
FWIW I still stand behind the arguments that I made in that old thread with Paul. I do think the game-theoretical considerations for AI maybe allowing some humans to survive are stronger, but they also feel loopy and like they depend on how good of a job we do on alignment, so I usually like to bracket them in conversations like this (though I agree it’s relevant for the prediction of whether AI will kill literally everyone).
(To be clear, I think there is a substantial chance of at least 1 billion people dying and that AI takeover is very bad from a longtermist perspective.)
Is there a writeup somewhere of how we’re likely to get “around a billion people die” that isn’t extinction, or close to it? Something about this phrasing feels weird/suspicious to me.
Like I have a few different stories for everyone dying (some sooner, or later).
I have some stories where like “almost 8 billion people” die and the AI scans the remainder.
I have some stories where the AI doesn’t really succeed and maybe kills millions of people, in what is more like “a major industrial accident” than “a powerful superintelligence enacting its goals”.
Technically “substantial chance of at least 1 billion people dying” can imply the middle option there, but it sounds like you mean the central example to be closer to a billion than 7.9 billion or whatever. That feels like a narrow target and I don’t really know what you have in mind.
Thinking a bit more, scenarios that seem at least kinda plausible:
“misuse” where someone is just actively trying to use AI to commit genocide or similar. Or, we get into an humans+AI vs human+AI war.
the AI economy takes off, it has lots of extreme environmental impact, and it’s sort of aligned but we’re not very good at regulating it fast enough, but, we get it under control after a billion death.
The AI kills a huge number of people with a bioweapon to destablize the world and relatively advantage its position.
Massive world war/nuclear war. This could kill 100s of millions easily. 1 billion is probably a bit on the higher end of what you’d expect.
The AI has control of some nations, but thinks that some subset of humans over which it has control pose a net risk such that mass slaughter is a good option.
AIs would prefer to keep humans alive, but there are multiple misaligned AI factions racing and this causes extreme environmental damage.
Technically “substantial chance of at least 1 billion people dying” can imply the middle option there, but it sounds like you mean the central example to be closer to a billion than 7.9 billion or whatever. That feels like a narrow target and I don’t really know what you have in mind.
I think “crazy large scale conflict (with WMDs)” or “mass slaughter to marginally increase odds of retaining control” or “extreme environmental issues” are all pretty central in what I’m imagining.
I think the number of deaths for these is maybe log normally distributed around 1 billion or so. That said, I’m low confidence.
(For reference, if the same fraction of people died as in WW2, it would be around 300 million. So, my view is similar to “substantial chance of a catastrophe which is a decent amount worse than WW2”.)
I’m not arguing that you shouldn’t be worried. I’m worried and I work on reducing AI risk as my full time job. I’m just arguing that it doesn’t seem like true and honest messaging. (In the absence of various interventions I proposed in the bottom of my comment.)
Okay, then what are your actual probabilities? I’m guessing it’s not sub-20% otherwise you wouldnt just say “<50%”, because for me preventing a say 10% chance of extinction is much more important than even a 99% chance of 2B people dying. And your comment was specifically dismissing focus on full extinction due to the <50% chance.
My current view is that conditional on ending up with full misaligned AI control:
20% extinction
50% chance >1 billion humans die or suffer outcome at least as bad as death.
for me preventing a say 10% chance of extinction is much more important than even a 99% chance of 2B people dying
I don’t see why this would be true:
From a longtermist perspective, we lose control over the lightcone either way (we’re conditioning on full misaligned AI control).
From a perspective where you just care about currently alive beings on planet earth, I don’t see why extinction is that much worse.
From a perspective in which you just want some being to be alive somewhere, I think that expansive notions of the universe/multiverse virtually guarantee this (but perhaps you dismiss this for some reason).
Also, to be clear, perspectives 2 and 3 don’t seem very reasonable to me as terminal philosophical views (rather than e.g. heuristics) as they priviledge time and locations in space in a pretty specific way.
I have a preference for minds as close to mine continuing existence assuming their lives are worth living. If it’s misaligned enough that the remaining humans don’t have good lives, then yes it doesn’t matter but I’d just lead with that rather than just the deaths.
And if they do have lives worth living and don’t end up being the last humans, then that leaves us with a lot more positive-human-lived-seconds in the 2B death case.
Sure, but 1. I only put 80% or so on MWI/MUH etc. and 2. I’m talking about optimizing for more positive-human-lived-seconds, not for just a binary ‘I want some humans to keep living’ .
I am dominated by it, and okay, I see what you are saying. Whichever scenario results in a higher chance of human control of the light cone is the one I prefer, and these considerations are relevant only where we don’t control it.
I really want to be able to simply convey that I am worried about outcomes which are similarly bad to “AIs kill everyone”. I put less than 50% that conditional on takeover, the AI’s leave humans alive because of something like “kindness”. I do think the decision theoretic reasons are maybe stronger, but I also don’t think that is the kind of thing one can convey to the general public.
I think it might be good to have another way of describing the bad outcomes I am worried about.
I like your suggestion of “AIs kill high fractions of humanity, including their children”, although it’s a bit clunky. Some other options, but I’m still not super confident are better:
AIs totally disempower humanity (I’m worried people will be like “Oh, but aren’t we currently disempowered by capitalism/society/etc”)
Overthrow the US government (maybe good for NatSec stuff, but doesn’t convey the full extent)
When talking to US policymakers, I don’t think there’s a big difference between “causes a national security crisis” and “kills literally everyone.” Worth noting that even though many in the AIS community see a big difference between “99% of people die but civilization restarts” vs. “100% of people die”, IMO this distinction does not matter to most policymakers (or at least matters way less to them).
Of course, in addition to conveying “this is a big deal” you need to convey the underlying threat model. There are lots of ways to interpret “AI causes a national security emergency” (e.g., China, military conflict). “Kills literally everyone” probably leads people to envision a narrower set of worlds.
But IMO even “kills literally everybody” doesn’t really convey the underlying misalignment/AI takeover threat model.
So my current recommendation (weakly held) is probably to go with “causes a national security emergency” or “overthrows the US government” and then accept that you have to do some extra work to actually get them to understand the “AGI--> AI takeover--> Lots of people die and we lose control” model.
Agreed but initially downvoted due to being obviously unproductive, but then upvoted for being an exquisite proof by absurdity about what’s productive: This is the first time I have seen clearly how good communication must forbid some amount of nuance.
The insight: You have a limited amount of time to communicate arguments and models; methods for reproducing some of your beliefs. With most people, you will never have enough time to transmit our entire technoeschatology or xenoeconomics stuff. It is useless to make claims about it, as the recipient has no way of checking them for errors or deceptions. You can only communicate approximations and submodules. No one will ever see the whole truth. (You do not see the whole truth. Your organization, even just within itself, will never agree about the whole truth.)
I’d like to once again reiterate that the arguments for misaligned AIs killing literally all humans (if they succeed in takeover) are quite weak and probably literally all humans dying conditional on AI takeover is unlikely (<50% likely).
(To be clear, I think there is a substantial chance of at least 1 billion people dying and that AI takeover is very bad from a longtermist perspective.)
This is due to:
The potential for the AI to be at least a tiny bit “kind” (same as humans probably wouldn’t kill all aliens). [1]
Decision theory/trade reasons
This is discussed in more detail here and here. (There is also some discussion here.)
(This content is copied from here and there is some discussion there.)
Further, as far as I can tell, central thought leaders of MIRI (Eliezer, Nate Soares) don’t actually believe that misaligned AI takeover will lead to the deaths of literally all humans:
Here Eliezer says:
Here Soares notes:
I think it doesn’t cost that much more to just keep humans physically alive, so if you’re imagining scanning for uploads, just keeping people alive is also plausible IMO. Perhaps this is an important crux?!?
Insofar as MIRI is planning on focusing on straightforwardly saying what they think as a comms strategy, it seems important to resolve this issue.
I’d be personally be happy with either:
Change the message to something like “building misaligned smarter-than-human systems will kill high fractions of humanity, including their children” (insofar as MIRI believes this)
More seriously defend the claims that AIs will only retain a subset brain scans for aliens (rather than keeping humans alive and happy or quickly reviving all humans into a good-for-each-human situation) and where reasonable include “AIs might retain brain scans that are later revived” (E.g. no need to include this in central messaging, I think it is reasonable (from an onion honesty perspective) to describe AIs as “killing all of us” if they physically kill all humans and then brain scan only a subset and sell these brain scans.)
(Edit: I clarify some of where I’m coming from here.)
This includes the potential for the AI to generally have preferences that are morally valueable from a typical human perspective.
The more complex messges sounds like a great way to make the public communication more complex and offputting.
The difference between killing everyone and killing almost everyone while keeping a few alive for arcane purposes does not matter to most people, nor should it.
I agree that the arguments for misaligned AGI killing absolutely everyone aren’t solid, but the arguments against that seem at least as shaky. So rounding it to “might quite possibly kill everyone” seems fair and succinct.
From the other thread where this comment originated: the argument that AGI won’t kill everyone because people wouldn’t kill everyone seems very bad, even when applied to human-imitating LLM-based AGI. People are nice because evolution meticulously made us nice. And even humans have killed an awful lot of people, with no sign they’d stop before killing everyone if it seemed useful for their goals.
Why not “AIs might violently takeover the world”?
Seems accurate to the concern while also avoiding any issues here.
That phrase sounds like the Terminator movies to me; it sounds like plucky humans could still band together to overthrow their robot overlords. I want to convey a total loss of control.
In documents where we have more room to unpack concepts I can imagine getting into some of the more exotic scenarios like aliens buying brain scans, but mostly I don’t expect our audiences to find that scenario reassuring in any way, and going into any detail about it doesn’t feel like a useful way to spend weirdness points.
Some of the other things you suggest, like future systems keeping humans physically alive, do not seem plausible to me. Whatever they’re trying to do, there’s almost certainly a better way to do it than by keeping Matrix-like human body farms running.
That may be a reasonable consequentialist decision given your goals, but it’s in tension with your claim in the post to be disregarding the advice of people telling you to “hoard status and credibility points, and [not] spend any on being weird.”
You’ve completely ignored the arguments from Paul Christiano that Ryan linked to at the top of the thread. (In case you missed it: 1 2.)
The claim under consideration is not that “keeping Matrix-like human body farms running” arises as an instrumental subgoal of “[w]hatever [AIs are] trying to do.” (If you didn’t have time to read the linked arguments, you could have just said that instead of inventing an obvious strawman.)
Rather, the claim is that it’s plausible that the AI we build (or some agency that has decision-theoretic bargaining power with it) cares about humans enough to spend some tiny fraction of the cosmic endowment on our welfare. (Compare to how humans care enough about nature preservation and animal welfare to spend some resources on it, even though it’s a tiny fraction of what our civilization is doing.)
Maybe you think that’s implausible, but if so, there should be a counterargument explaining why Christiano is wrong. As Ryan notes, Yudkowsky seems to believe that some scenarios in which an agency with bargaining power cares about humans are plausible, describing one example of such as “validly incorporat[ing] most all the hopes and fears and uncertainties that should properly be involved, without getting into any weirdness that I don’t expect Earthlings to think about validly.” I regard this statement as undermining your claim in the post that MIRI’s “reputation as straight shooters [...] remains intact.” Withholding information because you don’t trust your audience to reason validly (!!) is not at all the behavior of a “straight shooter”.
I think it makes sense to state the more direct threat-model of literal extinction; though I am also a little confused by the citing of weirdness points… I would’ve said that it makes the whole conversation more complex in a way that (I believe) everyone would reliably end up thinking was not a productive use of time.
(Expanding on this a little: I think that literal extinction is a likely default outcome, and most people who are newly coming to this topic would want to know that this is even in the hypothesis-space and find that to be key information. I think if I said “also maybe they later simulate us in weird configurations like pets for a day every billion years while experiencing insane things” they would not respond “ah, never mind then, this subject is no longer a very big issue”, they would be more like “I would’ve preferred that you had factored this element out of our discussion so far, we spent a lot of time on it yet it still seems to me like the extinction event being on the table is the primary thing that I want to debate”.)
Hmm, I’m not sure I exactly buy this. I think you should probably follow something like onion honesty which can involve intentionally simplifying your message to something you expect will give the audience more true views. I think you should lean on the side of stating things, but still, sometimes stating a thing which is true can be clearly distracting and confusing and thus you shouldn’t.
Passing the onion test is better than not passing it, but I think the relevant standard is having intent to inform. There’s a difference between trying to share relevant information in the hopes that the audience will integrate it with their own knowledge and use it to make better decisions, and selectively sharing information in the hopes of persuading the audience to make the decision you want them to make.
An evidence-filtering clever arguer can pass the onion test (by not omitting information that the audience would be surprised to learn was omitted) and pass the test of not technically lying (by not making false statements) while failing to make a rational argument in which the stated reasons are the real reasons.
Man I just want to say I appreciate you following up on each subthread and noting where you agree/disagree, it feels earnestly truthseeky to me.
I agree with Gretta here, and I think this is a crux. If MIRI folks thought it were likely that AI will leave a few humans biologically alive (as opposed to information-theoretically revivable), I don’t think we’d be comfortable saying “AI is going to kill everyone”. (I encourage other MIRI folks to chime in if they disagree with me about the counterfactual.)
I also personally have maybe half my probability mass on “the AI just doesn’t store any human brain-states long-term”, and I have less than 1% probability on “conditional on the AI storing human brain-states for future trade, the AI does in fact encounter aliens that want to trade and this trade results in a flourishing human civilization”.
Yeah, seems like a reasonable concern.
FWIW, I also do think that it is reasonably likely that we’ll see conflict between human factions and AI factions (likely with humans allies) in which the human factions could very plausibly win. So, personally, I don’t think that “immediate total loss of control” is what people should typically be imagining.
Insofar as AIs are doing things because they are what existing humans want (within some tiny cost budget), then I expect that you should imagine that what actually happens is what humans want (rather than e.g. what the AI thinks they “should want”) insofar as what humans want is cheap.
See also here which makes a similar argument in response to a similar point.
So, if humans don’t end up physically alive but do end up as uploads/body farms/etc one of a few things must be true:
Humans didn’t actually want to be physically alive and instead wanted to be uploads. In this case, it is very misleading to say “the AI will kill everyone (and sure there might be uploads, but you don’t want to be an upload right?)” because we’re conditioning on people deciding to become uploads!
It was too expensive to keep people physically alive rather than uploads. I think this is possible but somewhat implausible: the main reasons for cost here apply to uploads as much as to keeping humans physically alive. In particular, death due to conflict or mass slaughter in cases where conflict was the AI’s best option to increase the probability of long run control.
I don’t think slaughtering billions of people would be very useful. As a reference point, wars between countries almost never result in slaughtering that large a fraction of people
Unfortunately, if the AI really barely cares (e.g. <1/billion caring), it might only need to be barely useful.
I agree it is unlikely to be very useful.
I would like to +1 the “I don’t expect our audiences to find that scenario reassuring in any way”—I would also add that the average policymaker I’ve ever met wouldn’t find a lack of including the exotic scenarios to be in any way inaccurate or deceitful, unless you were way in the weeds for a multi-hour convo and-or they asked you in detail for “well, are there any weird edge cases where we make it through”.
Sure! I like it for brevity and accuracy of both the threat and its seriousness. I’ll try to use it instead of “kill everyone.”
I basically agree with this as stated, but think these arguments also imply that it is reasonably likely that the vast majority of people will survive misaligned AI takeover (perhaps 50% likely).
I also don’t think this is very well described as arcane purposes:
Kindness is pretty normal.
Decision theory motivations is actually also pretty normal from some perspective: it’s just the generalization of relatively normal “if you wouldn’t have screwed me over and it’s cheap for me, I won’t screw you over”. (Of course, people typically don’t motivate this sort of thing in terms of decision theory so there is a bit of a midwit meme here.)
You’re right. I didn’t mean to say that kindness is arcane. I was referring to acausal trade or other strange reasons to keep some humans around for possible future use.
Kindness is normal in our world, but I wouldn’t assume it will exist in every or even most situations with intelligent beings. Humans are instinctively kind (except for sociopathic and sadistic people), because that is good game theory for our situation: interactions with peers, in which collaboration/teamwork is useful.
A being capable of real recursive self-improvement, let alone duplication and creation of subordinate minds is not in that situation. They may temporarily be dealing with peers, but they might reasonably expect to have no need of collaborators in the near future. Thus, kindness isn’t rational for that type of being.
The exception would be if they could make a firm commitment to kindness while they do have peers and need collaborators. They might have kindness merely as an instrumental goal, in which case it would be abandoned as soon as it was no longer useful.
Or they might display kindness more instinctively, as a tendency in their thought or behavior. They might even have it engineered as an innate goal, as Steve hopes to engineer. In those last two cases, I think it’s possible that reflexive stability would keep that kindness in place as the AGI continued to grow, but I wouldn’t bet on it unless kindness was their central goal. If it was merely a tendency and not an explicit and therefore self-endorsed goal, I’d expect it to be dropped like the bad habit it effectively is. If it was an innate goal but not the strongest one, I don’t know but wouldn’t bet on it being long-term reflexively stable under deliberate self-modification.
(As far as I know, nobody has tried hard to work through the logic of reflexive stability of multiple goals. I tried, and gave it up as too vague and less urgent than other alignment questions. My tentative answer was maybe multiple goals would be reflectively stable; it depends on the exact structure of the decision-making process in that AGI/mind).
Here’s another way to frame why this matters.
When you make a claim like “misaligned AIs kill literally everyone”, then reasonable people will be like “but will they?” and you should be a in a position where you can defend this claim. But actually, MIRI doesn’t really want to defend this claim against the best objections (or at least they haven’t seriously done so yet AFAICT).
Further, the more MIRI does this sort of move, the more that reasonable potential allies will have to distance themselves.
I think most reasonable people will round off “some humans may be kept as brain scans that may have arbitrary cruelties done to them” to be equivalent to “everyone will be killed (or worse)” and not care about this particular point, seeing it as nitpicking that would not make the scenario any less horrible even if it was true.
I disagree. I think it matters a good amount. Like if the risk scenario is indeed “humans will probably get a solar system or two because it’s cheap from the perspective of the AI”. I also think there is a risk of AI torturing the uploads it has, and I agree that if that is the reason why humans are still alive then I would feel comfortable bracketing it, but I think Ryan is arguing more that something like “humans will get a solar system or two and basically get to have decent lives”.
Yep, this is an accurate description, but it is worth emphasizing that I think that horrible violent conflict and other bad outcomes for currently alive humans are reasonably likely.
IMO this is an utter loss scenario, to be clear.
I am not that confident about this. Or like, I don’t know, I do notice my psychological relationship to “all the stars explode” and “earth explodes” is very different, and I am not good enough at morality to be confident about dismissing that difference.
There’s definitely some difference, but I still think that the mathematical argument is just pretty strong, and losing a multiple of 1023 of your resources for hosting life and fun and goodness seems to me extremely close to “losing everything”.
@habryka I think you’re making a claim about whether or not the difference matters (IMO it does) but I perceived @Kaj_Sotala to be making a claim about whether “an average reasonably smart person out in society” would see the difference as meaningful (IMO they would not).
(My guess is you interpreted “reasonable people” to mean like “people who are really into reasoning about the world and trying to figure out the truth” and Kaj interpreted reasonable people to mean like “an average person.” Kaj should feel free to correct me if I’m wrong.)
The details matter here! Sometimes when (MIRI?) people say “unaligned AIs might be a bit nice and may not literally kill everyone” the modal story in their heads is something like some brain states of humans are saved in a hard drive somewhere for trade with more competent aliens. And sometimes when other people [1]say “unaligned humans might be a bit nice and may not literally kill everyone” the modal story in their heads is that some X% of humanity may or may not die in a violent coup, but the remaining humans get to live their normal lives on Earth (or even a solar system or two), with some AI survelliance but our subjective quality of life might not even be much worse (and might actually be better).
From a longtermist perspective, or a “dignity of human civilization” perspective, maybe the stories are pretty similar. But I expect “the average person” to be much more alarmed by the first story than the second, and not necessarily for bad reasons.
I don’t want to speak for Ryan or Paul, but at least tentatively this is my position: I basically think the difference from a resource management perspective of whether to keep humans around physically vs copies of them saved is ~0 when you have the cosmic endowment to play with, so small idiosyncratic preferences that’s significant enough to want to save human brain states should also be enough to be okay with keeping humans physically around; especially if humans strongly express a preference for the latter happening (which I think they do).
Note that “everyone will be killed (or worse)” is a different claim from “everyone will be killed”! (And see Oliver’s point that Ryan isn’t talking about mistreated brain scans.)
This is confusing to me; those quotes are compatible with Eliezer and Nate believing that it’s very likely that misaligned AI takeover leads to the deaths of literally all humans.
Perhaps you’re making some point about how if they think it’s at all plausible that it doesn’t lead to everyone dying, they shouldn’t say “building misaligned smarter-than-human systems will kill everyone”. But that doesn’t seem quite right to me: if someone believed event X will happen with 99.99% probability and they wanted to be succinct, I don’t think it’s very unreasonable to say “X will happen” instead of “X is very likely to happen” (as long as when it comes up at all, they’re honest with their estimates).
I agree these quotes are compatible with them thinking that the deaths of literally all humans are likely conditional on misaligned AI takeover.
I also agree that if they think that it is >75% likely that AI will kill literally everyone, then it seems like a reasonable and honest to say “misaligned AI takeover will kill literally everyone”.
I also think it seems fine to describe the situation as “killing literally everyone” even if the AI preserve a subset of humans as brain scans and sell those scans to aliens. (Though probably this should be caveated in various places.
But, I think that they don’t actually put >75% probability on AI killing literally everyone and these quotes are some (though not sufficient) evidence for this. Or more minimally, they don’t seem to have made a case for the AI killing literally everyone which addresses the decision theory counterargument effectively. (I do think Soares and Eliezer have argued for AIs not caring at all aside from decision theory grounds, though I’m also skeptical about this.)
I think that’s the crux here. I don’t think the decision theory counterargument alone would move me from 99% to 75% - there are quite a few other reasons my probability is lower than that, but not purely on the merits of the argument in focus here. I would be surprised if that weren’t the case for many others as well, and very surprised if they didn’t put >75% probably on AI killing literally everyone.
I guess my position comes down to: There are many places where I and presumably you disagree with Nate and Eliezer’s view and think their credences are quite different from ours, and I’m confused by the framing of this particular one as something like “this seems like a piece missing from your comms strategy”. Unless you have better reasons than I for thinking they don’t put >75% probability on this—which is definitely plausible and may have happened in IRL conversations I wasn’t a part of, in which case I’m wrong.
Based partially on my in person interactions with Nate and partially on some amalgamated sense from Nate and Eliezer’s comments on the topic, I don’t think they seem very commited to the view “the AI will kill literally everyone”.
Beyond this, I think Nate’s posts on the topic (here, here, and here) don’t seriously engage with the core arguments (listed in my comment) while simultaneously making a bunch of unimportant arguments that totally bury the lede.[1] See also my review of one of these posts here and Paul’s comment here making basically the same point.
I think it seems unfortunate to:
Make X part of your core comms messaging. (Because X is very linguistically nice.)
Make a bunch of posts hypothetically argueing for conclusion X while not really engaging with the best counterarguments and while making a bunch of points that bury the lede.
When these counterarguments are raised, note that you haven’t really thought much about the topic and that this isn’t much of a crux for you because a high fraction of your motivation is longtermist (see here).
Relevant quote from Nate:
To be clear, I think AIs might kill huge numbers of people. Also, whether misaligned AI takeover kills everyone with >90% probability or kills billions with 50% probability doesn’t effect the bottom line for stopping takeover much from most people’s perspective! I just think it would be good to fix the messaging here to something more solid.
(I have a variety of reasons for thinking this sort of falsehood is problematic which I could get into as needed.)
Edit: note that some of these posts make correct points about unrelated and important questions (e.g. making IMO correct arguments that you very likely can’t bamboozle a high fraction of resources out of an AI using decision theory), I’m just claiming that with respect to the question of “will the AI kill all humans” these posts fail to engage with the strongest arguments and bury the lede.
Two things:
For myself, I would not feel comfortable using language as confident-sounding as “on the default trajectory, AI is going to kill everyone” if I assigned (e.g.) 10% probability to “humanity [gets] a small future on a spare asteroid-turned-computer or an alien zoo or maybe even star”. I just think that scenario’s way, way less likely than that.
I’d be surprised if Nate assigns 10+% probability to scenarios like that, but he can speak for himself. 🤷♂️
I think some people at MIRI have significantly lower p(doom)? And I don’t expect those people to use language like “on the default trajectory, AI is going to kill everyone”.
I agree with you that there’s something weird about making lots of human-extinction-focused arguments when the thing we care more about is “does the cosmic endowment get turned into paperclips”? I do care about both of those things, an enormous amount; and I plan to talk about both of those things to some degree in public communications, rather than treating it as some kind of poorly-kept secret that MIRI folks care about whether flourishing interstellar civilizations get a chance to exist down the line. But I have this whole topic mentally flagged as a thing to be thoughtful and careful about, because it at least seems like an area that contains risk factors for future deceptive comms. E.g., if we update later to expecting the cosmic endowment to be wasted but all humans not dying, I would want us to adjust our messaging even if that means sacrificing some punchiness in our policy outreach.
Currently, however, I think the particular scenario “AI keeps a few flourishing humans around forever” is incredibly unlikely, and I don’t think Eliezer, Nate, etc. would say things like “this has a double-digit probability of happening in real life”? And, to be honest, the idea of myself and my family and friends and every other human being all dying in the near future really fucks me up and does not seem in any sense OK, even if (with my philosopher-hat on) I think this isn’t as big of a deal as “the cosmic endowment gets wasted”.
So I don’t currently feel bad about emphasizing a true prediction (“extremely likely that literally all humans literally nonconsensually die by violent means”), even though the philosophy-hat version of me thinks that the separate true prediction “extremely likely 99+% of the potential value of the long-term future is lost” is more morally important than that. Though I do feel obliged to semi-regularly mention the whole “cosmic endowment” thing in my public communication too, even if it doesn’t make it into various versions of my general-audience 60-second AI risk elevator pitch.
Thanks, this is clarifying from my perspective.
My remaining uncertainty is why you think AIs are so unlikely to keep humans around and treat them reasonably well (e.g. let them live out full lives).
From my perspective the argument that it is plausible that humans are treated well [even if misaligned AIs end up taking over the world and gaining absolute power] goes something like this:
If it only cost >1/million of overall resources to keep a reasonable fraction of humans alive and happy, it’s reasonably likely that misaligned AIs with full control would keep humans alive and happy due to either:
Acausal trade/decision theory
The AI terminally caring at least a bit about being nice to humans (perhaps because it cares a bit about respecting existing nearby agents or perhaps because it has at least a bit of human like values).
It is pretty likely that it costs <1/million of overall resources (from the AI’s perspective) to keep a reaonable fraction of humans alive and happy. Humans are extremely keep to keep around asymptotically and I think it can be pretty cheap even initially, especially if you’re a very smart AI.
(See links in my prior comment for more discussion.)
(I also think the argument goes through for 1/billion, but I thought I would focus on the higher value for now.)
Where do you disagree with this argument?
FWIW I still stand behind the arguments that I made in that old thread with Paul. I do think the game-theoretical considerations for AI maybe allowing some humans to survive are stronger, but they also feel loopy and like they depend on how good of a job we do on alignment, so I usually like to bracket them in conversations like this (though I agree it’s relevant for the prediction of whether AI will kill literally everyone).
[minor]
Worth noting that they might only depend to some extent as mediated by the correlation between our success and alien’s success.
High competent aliens which care a bunch about killing a bunch of existing beings seems pretty plausible to me.
Is there a writeup somewhere of how we’re likely to get “around a billion people die” that isn’t extinction, or close to it? Something about this phrasing feels weird/suspicious to me.
Like I have a few different stories for everyone dying (some sooner, or later).
I have some stories where like “almost 8 billion people” die and the AI scans the remainder.
I have some stories where the AI doesn’t really succeed and maybe kills millions of people, in what is more like “a major industrial accident” than “a powerful superintelligence enacting its goals”.
Technically “substantial chance of at least 1 billion people dying” can imply the middle option there, but it sounds like you mean the central example to be closer to a billion than 7.9 billion or whatever. That feels like a narrow target and I don’t really know what you have in mind.
Thinking a bit more, scenarios that seem at least kinda plausible:
“misuse” where someone is just actively trying to use AI to commit genocide or similar. Or, we get into an humans+AI vs human+AI war.
the AI economy takes off, it has lots of extreme environmental impact, and it’s sort of aligned but we’re not very good at regulating it fast enough, but, we get it under control after a billion death.
Some more:
The AI kills a huge number of people with a bioweapon to destablize the world and relatively advantage its position.
Massive world war/nuclear war. This could kill 100s of millions easily. 1 billion is probably a bit on the higher end of what you’d expect.
The AI has control of some nations, but thinks that some subset of humans over which it has control pose a net risk such that mass slaughter is a good option.
AIs would prefer to keep humans alive, but there are multiple misaligned AI factions racing and this causes extreme environmental damage.
I think “crazy large scale conflict (with WMDs)” or “mass slaughter to marginally increase odds of retaining control” or “extreme environmental issues” are all pretty central in what I’m imagining.
I think the number of deaths for these is maybe log normally distributed around 1 billion or so. That said, I’m low confidence.
(For reference, if the same fraction of people died as in WW2, it would be around 300 million. So, my view is similar to “substantial chance of a catastrophe which is a decent amount worse than WW2”.)
That’s a bizarre bar to me! 50%!? I’d be worried if it was 5%.
I’m not arguing that you shouldn’t be worried. I’m worried and I work on reducing AI risk as my full time job. I’m just arguing that it doesn’t seem like true and honest messaging. (In the absence of various interventions I proposed in the bottom of my comment.)
Okay, then what are your actual probabilities? I’m guessing it’s not sub-20% otherwise you wouldnt just say “<50%”, because for me preventing a say 10% chance of extinction is much more important than even a 99% chance of 2B people dying. And your comment was specifically dismissing focus on full extinction due to the <50% chance.
My current view is that conditional on ending up with full misaligned AI control:
20% extinction
50% chance >1 billion humans die or suffer outcome at least as bad as death.
I don’t see why this would be true:
From a longtermist perspective, we lose control over the lightcone either way (we’re conditioning on full misaligned AI control).
From a perspective where you just care about currently alive beings on planet earth, I don’t see why extinction is that much worse.
From a perspective in which you just want some being to be alive somewhere, I think that expansive notions of the universe/multiverse virtually guarantee this (but perhaps you dismiss this for some reason).
Also, to be clear, perspectives 2 and 3 don’t seem very reasonable to me as terminal philosophical views (rather than e.g. heuristics) as they priviledge time and locations in space in a pretty specific way.
I have a preference for minds as close to mine continuing existence assuming their lives are worth living. If it’s misaligned enough that the remaining humans don’t have good lives, then yes it doesn’t matter but I’d just lead with that rather than just the deaths.
And if they do have lives worth living and don’t end up being the last humans, then that leaves us with a lot more positive-human-lived-seconds in the 2B death case.
This view as stated seems very likely to be satisfied by e.g. everett branches. (See (3) on my above list.)
Sure, but 1. I only put 80% or so on MWI/MUH etc. and 2. I’m talking about optimizing for more positive-human-lived-seconds, not for just a binary ‘I want some humans to keep living’ .
Then why aren’t you mostly dominated by the possibility of >10^50 positive-human-lived-seconds via human control of the light cone?
Maybe some sort of diminishing returns?
I am dominated by it, and okay, I see what you are saying. Whichever scenario results in a higher chance of human control of the light cone is the one I prefer, and these considerations are relevant only where we don’t control it.
I really want to be able to simply convey that I am worried about outcomes which are similarly bad to “AIs kill everyone”. I put less than 50% that conditional on takeover, the AI’s leave humans alive because of something like “kindness”. I do think the decision theoretic reasons are maybe stronger, but I also don’t think that is the kind of thing one can convey to the general public.
I think it might be good to have another way of describing the bad outcomes I am worried about.
I like your suggestion of “AIs kill high fractions of humanity, including their children”, although it’s a bit clunky. Some other options, but I’m still not super confident are better:
AIs totally disempower humanity (I’m worried people will be like “Oh, but aren’t we currently disempowered by capitalism/society/etc”)
Overthrow the US government (maybe good for NatSec stuff, but doesn’t convey the full extent)
My two cents RE particular phrasing:
When talking to US policymakers, I don’t think there’s a big difference between “causes a national security crisis” and “kills literally everyone.” Worth noting that even though many in the AIS community see a big difference between “99% of people die but civilization restarts” vs. “100% of people die”, IMO this distinction does not matter to most policymakers (or at least matters way less to them).
Of course, in addition to conveying “this is a big deal” you need to convey the underlying threat model. There are lots of ways to interpret “AI causes a national security emergency” (e.g., China, military conflict). “Kills literally everyone” probably leads people to envision a narrower set of worlds.
But IMO even “kills literally everybody” doesn’t really convey the underlying misalignment/AI takeover threat model.
So my current recommendation (weakly held) is probably to go with “causes a national security emergency” or “overthrows the US government” and then accept that you have to do some extra work to actually get them to understand the “AGI--> AI takeover--> Lots of people die and we lose control” model.
See my other comment here for reference:
Agreed but initially downvoted due to being obviously unproductive, but then upvoted for being an exquisite proof by absurdity about what’s productive: This is the first time I have seen clearly how good communication must forbid some amount of nuance.
The insight: You have a limited amount of time to communicate arguments and models; methods for reproducing some of your beliefs. With most people, you will never have enough time to transmit our entire technoeschatology or xenoeconomics stuff. It is useless to make claims about it, as the recipient has no way of checking them for errors or deceptions. You can only communicate approximations and submodules. No one will ever see the whole truth. (You do not see the whole truth. Your organization, even just within itself, will never agree about the whole truth.)
I don’t think you should generally upvote things on the basis of indirectly explaining things via being unproductive lol.
I guess in this case I’m arguing that it’s accidentally, accidentally, productive.
I wrote [a two paragraph explanation](https://www.lesswrong.com/posts/4ceKBbcpGuqqknCj9/the-two-paragraph-argument-for-ai-risk of AI doom not too long ago.
I think this still means MIRI is correct when it comes to the expected value though
If you’re a longtermist, sure.
If you just want to survive, not clearly.