Further, as far as I can tell, central thought leaders of MIRI (Eliezer, Nate Soares) don’t actually believe that misaligned AI takeover will lead to the deaths of literally all humans:
This is confusing to me; those quotes are compatible with Eliezer and Nate believing that it’s very likely that misaligned AI takeover leads to the deaths of literally all humans.
Perhaps you’re making some point about how if they think it’s at all plausible that it doesn’t lead to everyone dying, they shouldn’t say “building misaligned smarter-than-human systems will kill everyone”. But that doesn’t seem quite right to me: if someone believed event X will happen with 99.99% probability and they wanted to be succinct, I don’t think it’s very unreasonable to say “X will happen” instead of “X is very likely to happen” (as long as when it comes up at all, they’re honest with their estimates).
I agree these quotes are compatible with them thinking that the deaths of literally all humans are likely conditional on misaligned AI takeover.
I also agree that if they think that it is >75% likely that AI will kill literally everyone, then it seems like a reasonable and honest to say “misaligned AI takeover will kill literally everyone”.
I also think it seems fine to describe the situation as “killing literally everyone” even if the AI preserve a subset of humans as brain scans and sell those scans to aliens. (Though probably this should be caveated in various places.
But, I think that they don’t actually put >75% probability on AI killing literally everyone and these quotes are some (though not sufficient) evidence for this. Or more minimally, they don’t seem to have made a case for the AI killing literally everyone which addresses the decision theory counterargument effectively. (I do think Soares and Eliezer have argued for AIs not caring at all aside from decision theory grounds, though I’m also skeptical about this.)
they don’t seem to have made a case for the AI killing literally everyone which addresses the decision theory counterargument effectively.
I think that’s the crux here. I don’t think the decision theory counterargument alone would move me from 99% to 75% - there are quite a few other reasons my probability is lower than that, but not purely on the merits of the argument in focus here. I would be surprised if that weren’t the case for many others as well, and very surprised if they didn’t put >75% probably on AI killing literally everyone.
I guess my position comes down to: There are many places where I and presumably you disagree with Nate and Eliezer’s view and think their credences are quite different from ours, and I’m confused by the framing of this particular one as something like “this seems like a piece missing from your comms strategy”. Unless you have better reasons than I for thinking they don’t put >75% probability on this—which is definitely plausible and may have happened in IRL conversations I wasn’t a part of, in which case I’m wrong.
I’m confused by the framing of this particular one as something like “this seems like a piece missing from your comms strategy”. Unless you have better reasons than I for thinking they don’t put >75% probability on this—which is definitely plausible and may have happened in IRL conversations I wasn’t a part of, in which case I’m wrong.
Based partially on my in person interactions with Nate and partially on some amalgamated sense from Nate and Eliezer’s comments on the topic, I don’t think they seem very commited to the view “the AI will kill literally everyone”.
Beyond this, I think Nate’s posts on the topic (here, here, and here) don’t seriously engage with the core arguments (listed in my comment) while simultaneously making a bunch of unimportant arguments that totally bury the lede.[1] See also my review of one of these posts here and Paul’s comment here making basically the same point.
I think it seems unfortunate to:
Make X part of your core comms messaging. (Because X is very linguistically nice.)
Make a bunch of posts hypothetically argueing for conclusion X while not really engaging with the best counterarguments and while making a bunch of points that bury the lede.
When these counterarguments are raised, note that you haven’t really thought much about the topic and that this isn’t much of a crux for you because a high fraction of your motivation is longtermist (see here).
Relevant quote from Nate:
I am not trying to argue with high confidence that humanity doesn’t get a small future on a spare asteroid-turned-computer or an alien zoo or maybe even star if we’re lucky, and acknowledge again that I haven’t much tried to think about the specifics of whether the spare asteroid or the alien zoo or distant simulations or oblivion is more likely, because it doesn’t much matter relative to the issue of securing the cosmic endowment in the name of Fun.
To be clear, I think AIs might kill huge numbers of people. Also, whether misaligned AI takeover kills everyone with >90% probability or kills billions with 50% probability doesn’t effect the bottom line for stopping takeover much from most people’s perspective! I just think it would be good to fix the messaging here to something more solid.
(I have a variety of reasons for thinking this sort of falsehood is problematic which I could get into as needed.)
Edit: note that some of these posts make correct points about unrelated and important questions (e.g. making IMO correct arguments that you very likely can’t bamboozle a high fraction of resources out of an AI using decision theory), I’m just claiming that with respect to the question of “will the AI kill all humans” these posts fail to engage with the strongest arguments and bury the lede.
For myself, I would not feel comfortable using language as confident-sounding as “on the default trajectory, AI is going to kill everyone” if I assigned (e.g.) 10% probability to “humanity [gets] a small future on a spare asteroid-turned-computer or an alien zoo or maybe even star”. I just think that scenario’s way, way less likely than that.
I’d be surprised if Nate assigns 10+% probability to scenarios like that, but he can speak for himself. 🤷♂️
I think some people at MIRI have significantly lower p(doom)? And I don’t expect those people to use language like “on the default trajectory, AI is going to kill everyone”.
I agree with you that there’s something weird about making lots of human-extinction-focused arguments when the thing we care more about is “does the cosmic endowment get turned into paperclips”? I do care about both of those things, an enormous amount; and I plan to talk about both of those things to some degree in public communications, rather than treating it as some kind of poorly-kept secret that MIRI folks care about whether flourishing interstellar civilizations get a chance to exist down the line. But I have this whole topic mentally flagged as a thing to be thoughtful and careful about, because it at least seems like an area that contains risk factors for future deceptive comms. E.g., if we update later to expecting the cosmic endowment to be wasted but all humans not dying, I would want us to adjust our messaging even if that means sacrificing some punchiness in our policy outreach.
Currently, however, I think the particular scenario “AI keeps a few flourishing humans around forever” is incredibly unlikely, and I don’t think Eliezer, Nate, etc. would say things like “this has a double-digit probability of happening in real life”? And, to be honest, the idea of myself and my family and friends and every other human being all dying in the near future really fucks me up and does not seem in any sense OK, even if (with my philosopher-hat on) I think this isn’t as big of a deal as “the cosmic endowment gets wasted”.
So I don’t currently feel bad about emphasizing a true prediction (“extremely likely that literally all humans literally nonconsensually die by violent means”), even though the philosophy-hat version of me thinks that the separate true prediction “extremely likely 99+% of the potential value of the long-term future is lost” is more morally important than that. Though I do feel obliged to semi-regularly mention the whole “cosmic endowment” thing in my public communication too, even if it doesn’t make it into various versions of my general-audience 60-second AI risk elevator pitch.
My remaining uncertainty is why you think AIs are so unlikely to keep humans around and treat them reasonably well (e.g. let them live out full lives).
From my perspective the argument that it is plausible that humans are treated well [even if misaligned AIs end up taking over the world and gaining absolute power] goes something like this:
If it only cost >1/million of overall resources to keep a reasonable fraction of humans alive and happy, it’s reasonably likely that misaligned AIs with full control would keep humans alive and happy due to either:
Acausal trade/decision theory
The AI terminally caring at least a bit about being nice to humans (perhaps because it cares a bit about respecting existing nearby agents or perhaps because it has at least a bit of human like values).
It is pretty likely that it costs <1/million of overall resources (from the AI’s perspective) to keep a reaonable fraction of humans alive and happy. Humans are extremely keep to keep around asymptotically and I think it can be pretty cheap even initially, especially if you’re a very smart AI.
(See links in my prior comment for more discussion.)
(I also think the argument goes through for 1/billion, but I thought I would focus on the higher value for now.)
This is confusing to me; those quotes are compatible with Eliezer and Nate believing that it’s very likely that misaligned AI takeover leads to the deaths of literally all humans.
Perhaps you’re making some point about how if they think it’s at all plausible that it doesn’t lead to everyone dying, they shouldn’t say “building misaligned smarter-than-human systems will kill everyone”. But that doesn’t seem quite right to me: if someone believed event X will happen with 99.99% probability and they wanted to be succinct, I don’t think it’s very unreasonable to say “X will happen” instead of “X is very likely to happen” (as long as when it comes up at all, they’re honest with their estimates).
I agree these quotes are compatible with them thinking that the deaths of literally all humans are likely conditional on misaligned AI takeover.
I also agree that if they think that it is >75% likely that AI will kill literally everyone, then it seems like a reasonable and honest to say “misaligned AI takeover will kill literally everyone”.
I also think it seems fine to describe the situation as “killing literally everyone” even if the AI preserve a subset of humans as brain scans and sell those scans to aliens. (Though probably this should be caveated in various places.
But, I think that they don’t actually put >75% probability on AI killing literally everyone and these quotes are some (though not sufficient) evidence for this. Or more minimally, they don’t seem to have made a case for the AI killing literally everyone which addresses the decision theory counterargument effectively. (I do think Soares and Eliezer have argued for AIs not caring at all aside from decision theory grounds, though I’m also skeptical about this.)
I think that’s the crux here. I don’t think the decision theory counterargument alone would move me from 99% to 75% - there are quite a few other reasons my probability is lower than that, but not purely on the merits of the argument in focus here. I would be surprised if that weren’t the case for many others as well, and very surprised if they didn’t put >75% probably on AI killing literally everyone.
I guess my position comes down to: There are many places where I and presumably you disagree with Nate and Eliezer’s view and think their credences are quite different from ours, and I’m confused by the framing of this particular one as something like “this seems like a piece missing from your comms strategy”. Unless you have better reasons than I for thinking they don’t put >75% probability on this—which is definitely plausible and may have happened in IRL conversations I wasn’t a part of, in which case I’m wrong.
Based partially on my in person interactions with Nate and partially on some amalgamated sense from Nate and Eliezer’s comments on the topic, I don’t think they seem very commited to the view “the AI will kill literally everyone”.
Beyond this, I think Nate’s posts on the topic (here, here, and here) don’t seriously engage with the core arguments (listed in my comment) while simultaneously making a bunch of unimportant arguments that totally bury the lede.[1] See also my review of one of these posts here and Paul’s comment here making basically the same point.
I think it seems unfortunate to:
Make X part of your core comms messaging. (Because X is very linguistically nice.)
Make a bunch of posts hypothetically argueing for conclusion X while not really engaging with the best counterarguments and while making a bunch of points that bury the lede.
When these counterarguments are raised, note that you haven’t really thought much about the topic and that this isn’t much of a crux for you because a high fraction of your motivation is longtermist (see here).
Relevant quote from Nate:
To be clear, I think AIs might kill huge numbers of people. Also, whether misaligned AI takeover kills everyone with >90% probability or kills billions with 50% probability doesn’t effect the bottom line for stopping takeover much from most people’s perspective! I just think it would be good to fix the messaging here to something more solid.
(I have a variety of reasons for thinking this sort of falsehood is problematic which I could get into as needed.)
Edit: note that some of these posts make correct points about unrelated and important questions (e.g. making IMO correct arguments that you very likely can’t bamboozle a high fraction of resources out of an AI using decision theory), I’m just claiming that with respect to the question of “will the AI kill all humans” these posts fail to engage with the strongest arguments and bury the lede.
Two things:
For myself, I would not feel comfortable using language as confident-sounding as “on the default trajectory, AI is going to kill everyone” if I assigned (e.g.) 10% probability to “humanity [gets] a small future on a spare asteroid-turned-computer or an alien zoo or maybe even star”. I just think that scenario’s way, way less likely than that.
I’d be surprised if Nate assigns 10+% probability to scenarios like that, but he can speak for himself. 🤷♂️
I think some people at MIRI have significantly lower p(doom)? And I don’t expect those people to use language like “on the default trajectory, AI is going to kill everyone”.
I agree with you that there’s something weird about making lots of human-extinction-focused arguments when the thing we care more about is “does the cosmic endowment get turned into paperclips”? I do care about both of those things, an enormous amount; and I plan to talk about both of those things to some degree in public communications, rather than treating it as some kind of poorly-kept secret that MIRI folks care about whether flourishing interstellar civilizations get a chance to exist down the line. But I have this whole topic mentally flagged as a thing to be thoughtful and careful about, because it at least seems like an area that contains risk factors for future deceptive comms. E.g., if we update later to expecting the cosmic endowment to be wasted but all humans not dying, I would want us to adjust our messaging even if that means sacrificing some punchiness in our policy outreach.
Currently, however, I think the particular scenario “AI keeps a few flourishing humans around forever” is incredibly unlikely, and I don’t think Eliezer, Nate, etc. would say things like “this has a double-digit probability of happening in real life”? And, to be honest, the idea of myself and my family and friends and every other human being all dying in the near future really fucks me up and does not seem in any sense OK, even if (with my philosopher-hat on) I think this isn’t as big of a deal as “the cosmic endowment gets wasted”.
So I don’t currently feel bad about emphasizing a true prediction (“extremely likely that literally all humans literally nonconsensually die by violent means”), even though the philosophy-hat version of me thinks that the separate true prediction “extremely likely 99+% of the potential value of the long-term future is lost” is more morally important than that. Though I do feel obliged to semi-regularly mention the whole “cosmic endowment” thing in my public communication too, even if it doesn’t make it into various versions of my general-audience 60-second AI risk elevator pitch.
Thanks, this is clarifying from my perspective.
My remaining uncertainty is why you think AIs are so unlikely to keep humans around and treat them reasonably well (e.g. let them live out full lives).
From my perspective the argument that it is plausible that humans are treated well [even if misaligned AIs end up taking over the world and gaining absolute power] goes something like this:
If it only cost >1/million of overall resources to keep a reasonable fraction of humans alive and happy, it’s reasonably likely that misaligned AIs with full control would keep humans alive and happy due to either:
Acausal trade/decision theory
The AI terminally caring at least a bit about being nice to humans (perhaps because it cares a bit about respecting existing nearby agents or perhaps because it has at least a bit of human like values).
It is pretty likely that it costs <1/million of overall resources (from the AI’s perspective) to keep a reaonable fraction of humans alive and happy. Humans are extremely keep to keep around asymptotically and I think it can be pretty cheap even initially, especially if you’re a very smart AI.
(See links in my prior comment for more discussion.)
(I also think the argument goes through for 1/billion, but I thought I would focus on the higher value for now.)
Where do you disagree with this argument?