I really want to be able to simply convey that I am worried about outcomes which are similarly bad to “AIs kill everyone”. I put less than 50% that conditional on takeover, the AI’s leave humans alive because of something like “kindness”. I do think the decision theoretic reasons are maybe stronger, but I also don’t think that is the kind of thing one can convey to the general public.
I think it might be good to have another way of describing the bad outcomes I am worried about.
I like your suggestion of “AIs kill high fractions of humanity, including their children”, although it’s a bit clunky. Some other options, but I’m still not super confident are better:
AIs totally disempower humanity (I’m worried people will be like “Oh, but aren’t we currently disempowered by capitalism/society/etc”)
Overthrow the US government (maybe good for NatSec stuff, but doesn’t convey the full extent)
When talking to US policymakers, I don’t think there’s a big difference between “causes a national security crisis” and “kills literally everyone.” Worth noting that even though many in the AIS community see a big difference between “99% of people die but civilization restarts” vs. “100% of people die”, IMO this distinction does not matter to most policymakers (or at least matters way less to them).
Of course, in addition to conveying “this is a big deal” you need to convey the underlying threat model. There are lots of ways to interpret “AI causes a national security emergency” (e.g., China, military conflict). “Kills literally everyone” probably leads people to envision a narrower set of worlds.
But IMO even “kills literally everybody” doesn’t really convey the underlying misalignment/AI takeover threat model.
So my current recommendation (weakly held) is probably to go with “causes a national security emergency” or “overthrows the US government” and then accept that you have to do some extra work to actually get them to understand the “AGI--> AI takeover--> Lots of people die and we lose control” model.
I really want to be able to simply convey that I am worried about outcomes which are similarly bad to “AIs kill everyone”. I put less than 50% that conditional on takeover, the AI’s leave humans alive because of something like “kindness”. I do think the decision theoretic reasons are maybe stronger, but I also don’t think that is the kind of thing one can convey to the general public.
I think it might be good to have another way of describing the bad outcomes I am worried about.
I like your suggestion of “AIs kill high fractions of humanity, including their children”, although it’s a bit clunky. Some other options, but I’m still not super confident are better:
AIs totally disempower humanity (I’m worried people will be like “Oh, but aren’t we currently disempowered by capitalism/society/etc”)
Overthrow the US government (maybe good for NatSec stuff, but doesn’t convey the full extent)
My two cents RE particular phrasing:
When talking to US policymakers, I don’t think there’s a big difference between “causes a national security crisis” and “kills literally everyone.” Worth noting that even though many in the AIS community see a big difference between “99% of people die but civilization restarts” vs. “100% of people die”, IMO this distinction does not matter to most policymakers (or at least matters way less to them).
Of course, in addition to conveying “this is a big deal” you need to convey the underlying threat model. There are lots of ways to interpret “AI causes a national security emergency” (e.g., China, military conflict). “Kills literally everyone” probably leads people to envision a narrower set of worlds.
But IMO even “kills literally everybody” doesn’t really convey the underlying misalignment/AI takeover threat model.
So my current recommendation (weakly held) is probably to go with “causes a national security emergency” or “overthrows the US government” and then accept that you have to do some extra work to actually get them to understand the “AGI--> AI takeover--> Lots of people die and we lose control” model.
See my other comment here for reference: