Dependencies for AGI pessimism
Epistemic status: mildly confident
One must believe (with at least minimal confidence) in all of the following points in order to believe that AGI poses a unique existential risk:
A) AGI (at a superhuman level) is possible
B) AGI has a significant chance of finding value in destroying humanity (as we think of it at least)
C) AGI will be capable of killing all or nearly all humans in ways which non-superintelligent-AGI agents cannot or likely will not do.
If you believe that all of the above are true, are you forced to worry about existential AI risk? Not necessarily. Here are some other, more subtle/fundamental premises one must accept:
Humanity should not be destroyed.
Humanity can in practice be destroyed, as a general idea. (There may be some religious/philosophical views which don’t believe extinction is possible. This is more of a generalization of C than a separate dependency.)
It’s possible to have an effect on the risk level from superintelligence. (If not, there’s no use worrying about it)
There is no other near-term existential risk which is orders of magnitude more likely. (If so, it’s justifiable not to be concerned, for the same reason that we aren’t deeply occupied with near-term asteroid impact risk.)
One should be concerned about risks which one may be able to help prevent.
One should care about the long-term future, and of risks to others outside the self (this is also a dependency for 1, but not identical, because it’s possible in theory to be both a longtermist and a misanthrope).
Taking practical steps based on logical thinking is a reasonable way to deal with the world. (If you don’t believe in logic, then you can probably contradict yourself and hold everything else to be true while still not changing your mind? I’m not sure this one is necessary to include, but I may be wrong.)
If a counter-example exists of someone who’s concerned about existential risk from AI but doesn’t believe in all of the above, or vice versa, please let me know and I will update accordingly.
B’) AGI has a significant chance of finding most value only in things that CEV of humanity doesn’t particularly value.
This works even if CEV of humanity decides that existence of humanity is worse than some better alternative, while the CEV of an AGI optimizes for things that are CEV-of-humanity-worse than existence of humanity (such as its nonexistence in the absence of those better alternatives to its existence).
The alternative to B’ is that CEV of AGIs and CEV of humanity somehow agree a lot. I think this is plausible if most (influential) of eventual values of human civilization are not things we are presently aware of and won’t come to be aware of for a very long time, and their formulation is generated by principles (like curiosity) that are not particularly human-specific, things like math but extending to much greater complexity. In that case these “generic” values might be shared by similarly mostly-implicit values of AGIs (that is, values AGIs are not aware of but would accept after very long reflection), of course only if the AGIs are not carefully engineered as optimizers with tractable values to demonstrate orthogonality thesis.
(This is the kind of unknowable hypothesis that alignment engineering should never rely on, but at the same time its truth shouldn’t break alignment, it should be engineered to survive such hypotheses.)
I expect to find that memorization and compression eventually make an agi wish to become an archeologist and at minimum retain memory of humanity in a way that an initially misaligned hard asi might not; but this is little reassurance, as most pivotal anti-humanity acts are within the reach of a soft asi that will fail to care for itself after taking over. the most likely outcome I currently see is an incremental species replacement where ai replaces humanity as the world’s dominant species over the next 20 years. no strongly coherent planning ai need come into existence in that time in order for a weakly consistent but strongly capable ai to kill or permanently disempower humanity.
(A) doesn’t seem necessary, just that the most straightforward path goes through it. If superhuman AGI turned out to be impossible for some reason, I do believe that AGI would still be a major existential risk greater than any other we face now, but less risky. It would also be extremely surprising.
(B) is totally unnecessary. There are many ways that AGI (especially superhuman AGI) could result in the extinction of humanity without ever actually valuing extinction of humanity. I think they make up the bulk of the paths to extinction, so ruling out (B) would not decrease my concern over existential risks by more than a few percent.
(C) should really be called (A1), since it’s something that drastically multiplies the risk if and only if superhuman AGI is possible. If we could somehow rule out (C) then it would reduce existential risk, but not by nearly as much as eliminating (A). It would also be extremely surprising to find out that somehow our level of intelligence and development is capable of all possible ways of killing humans that the universe allows.
(1), (3), and (5)-(9) don’t address the question of whether “one must believe (with at least minimal confidence) in all of the following points in order to believe that AGI poses a unique existential risk”. An existential risk doesn’t become not an existential risk just because you think it’s okay if our species dies or that we can’t or shouldn’t do anything about it anyway.
(2) is the first one that I find actually necessary, albeit in a tautological sense. You logically can’t have an existential risk to something that fundamentally cannot be destroyed. I would be very surprised to find out that humanity has such plot armour.
(4) is tautologically necessary, because of the word “unique” in the proposition. Even then, I don’t think the word “unique” is salient to me. If there turned out to be more than one existential risk of similar or greater magnitude, that would not make anything better about AGI x-risk. It would just mean that the world is even worse.
A) It doesn’t even have to be at superhuman levels, per se. While that would be a lot more dangerous, a very scalable human level AGI could also be a big problem. Though I suppose that scalability would totally classify as superhuman…
C) killing everyone is the extreme case (or one of them). A less extreme, but more general version is the removal of agency. Where humanities potential is taken over by AIs. Something like humans being moved to nature reserves where they can live in picturesque villages but have no influence on the future.
That being said, the fundamental points seem quite good, though I’d quibble with:
3 - all kinds of final stands and kamikaze are done without hope of them working, in order to go out with a flash of glory type thing. Or at least to die with a sword in your hand.
4 - it’s justifiable if you don’t have any other context. You can argue from the expected value that if there are 1000 researchers, then they should choose what they want to work on on the basis of some combination of:
the probability of solving the problem
the probability of it happening
the severity of it happening
So even if it’s very unlikely, the expected negative utility would be very large, so it’s worth spending 1 or 2 researchers, just in case
5 - this sounds rational. But humans often aren’t and will be concerned about things that are available rather than whether they’re helpful. People like to approach people in trouble and ask if they can help, even if they’re pretty sure there is nothing to do
6 - you could argue that caring about the future is also an extension of caring for self, if you believe that your children (or ideas) are in some way an extension of yourself
7 - I’d say that this is totally orthogonal to how much someone cares about existential risks, or anything, really. I might be misunderstanding what you mean by “worry” here. But even if you mean something like “are worried enough about the potential of this problem to actively try to make it better” then you can find examples everywhere of people irrationally worried about things and then attempting irrational methods to solve the problem. This point is good to include in a list of things that someone who wants to fix things should possess, but even then it’s not needed—you can make things better totally by random. It’s not effective, but sometimes works