Toby Ord recently published a nice piece On the Value of Advancing Progressabout mathematical projections of far-future outcomes given different rates of progress and risk levels. The problem with that and many arguments for caution is that people usually barely care about possibilities even twenty years out.
We could talk about sharp discounting curves in decision-making studies, and how that makes sense given evolutionary pressures in tribal environments. But I think this is pretty obvious from talking to people and watching our political and economic practices.
Utilitarianism is a nicely self-consistent value system. Utilitarianism pretty clearly implies longtermism. Most people don’t care that much about logical consistency,[1] so they are happily non-utilitarian and non-longtermist in a variety of ways. Many arguments for AGI safety are longtermist, or at least long-term, so they’re not going to work well for most of humanity.
This is a fairly obvious, but worth-keeping-in-mind point.
One non-obvious lemma of this observation is that much skepticism about AGI x-risk is probably based on skepticism about AGI happening soon. This doesn’t explain all skepticism, but it’s a significant factor worth addressing. When people dig into their logic, that’s often a central point. They start out saying “AGI wouldn’t kill humans” then over the course of a conversation it turns out that they feel that way primarily because they don’t think real AGI will happen in their lifetimes. Any discussion of AGI x-risks isn’t productive, because they just don’t care about it.
The obvious counterpoint is “You’re pretty sure it won’t happen soon? I didn’t know you were an expert in AI or cognition!” Please don’t say this—nothing convinces your opponents to cling to their positions beyond all logic like calling them stupid.[2] Something like “well, a lot of people with the most relevant expertise think it will happen pretty soon. A bunch more think it will take longer. So I just assume I don’t know which is right, and it might very well happen pretty soon”.
It looks to me like discussing whether AGI might threaten humans is pretty pointless if the person is still assuming it’s not going to happen for a long time. Once you’re past that, it might make sense to actually talk about why you think AGI would be risky for humans.[3]
This is an aside, but you’ll probably find that utilitarianism isn’t that much more logical than other value systems anyway. Preferring what your brain wants you to prefer, while avoiding drastic inconsistency, has practical advantages over values that are more consistent but that clash with your felt emotions. So let’s not assume humanity isn’t utilitarian just because it’s stupid.
Making sure any discussions you have about x-risk are pleasant for all involved is probably actually the most important strategy. I strongly suspect that personal affinity weighs more heavily than logic on average, even for fairly intellectual people. (Rationalists are a special case; I think we’re resistant but not immune to motivated reasoning).
So making a few points in a pleasant way, then moving on to other topics they like is probably way better than making the perfect logical argument while even slightly irritating them.
From there you might be having the actual discussion on why AGI might threaten humans. Here are some things I’ve seen be convincing.
People seem to often think “okay fine it might happen soon, but surely AI smarter than us still won’t have free will and make its own goals”. From there you could point out that it needs goals to be useful, and if it misunderstands those goals even slightly, it might be bad. Russell’s “you can’t fetch the coffee if you’re dead” is my favorite intuitive explanation of instrumental convergence creating unexpected consequences. This requires explaining that we wouldn’t screw it up in quite such an obvious way, but the metaphor goes pretty deep into more subtle complexities of goals and logic.
The other big points, in my observation, are “people screw up complex projects a lot, especially on the first try” and “you’d probably think it was dangerous if advanced aliens were landing, right?”. One final intuitive point to make is that even if they do always correctly follow human instructions, some human will accidentally or deliberately give them very bad instructions.
Humanity isn’t remotely longtermist, so arguments for AGI x-risk should focus on the near term
Toby Ord recently published a nice piece On the Value of Advancing Progress about mathematical projections of far-future outcomes given different rates of progress and risk levels. The problem with that and many arguments for caution is that people usually barely care about possibilities even twenty years out.
We could talk about sharp discounting curves in decision-making studies, and how that makes sense given evolutionary pressures in tribal environments. But I think this is pretty obvious from talking to people and watching our political and economic practices.
Utilitarianism is a nicely self-consistent value system. Utilitarianism pretty clearly implies longtermism. Most people don’t care that much about logical consistency,[1] so they are happily non-utilitarian and non-longtermist in a variety of ways. Many arguments for AGI safety are longtermist, or at least long-term, so they’re not going to work well for most of humanity.
This is a fairly obvious, but worth-keeping-in-mind point.
One non-obvious lemma of this observation is that much skepticism about AGI x-risk is probably based on skepticism about AGI happening soon. This doesn’t explain all skepticism, but it’s a significant factor worth addressing. When people dig into their logic, that’s often a central point. They start out saying “AGI wouldn’t kill humans” then over the course of a conversation it turns out that they feel that way primarily because they don’t think real AGI will happen in their lifetimes. Any discussion of AGI x-risks isn’t productive, because they just don’t care about it.
The obvious counterpoint is “You’re pretty sure it won’t happen soon? I didn’t know you were an expert in AI or cognition!” Please don’t say this—nothing convinces your opponents to cling to their positions beyond all logic like calling them stupid.[2] Something like “well, a lot of people with the most relevant expertise think it will happen pretty soon. A bunch more think it will take longer. So I just assume I don’t know which is right, and it might very well happen pretty soon”.
It looks to me like discussing whether AGI might threaten humans is pretty pointless if the person is still assuming it’s not going to happen for a long time. Once you’re past that, it might make sense to actually talk about why you think AGI would be risky for humans.[3]
This is an aside, but you’ll probably find that utilitarianism isn’t that much more logical than other value systems anyway. Preferring what your brain wants you to prefer, while avoiding drastic inconsistency, has practical advantages over values that are more consistent but that clash with your felt emotions. So let’s not assume humanity isn’t utilitarian just because it’s stupid.
Making sure any discussions you have about x-risk are pleasant for all involved is probably actually the most important strategy. I strongly suspect that personal affinity weighs more heavily than logic on average, even for fairly intellectual people. (Rationalists are a special case; I think we’re resistant but not immune to motivated reasoning).
So making a few points in a pleasant way, then moving on to other topics they like is probably way better than making the perfect logical argument while even slightly irritating them.
From there you might be having the actual discussion on why AGI might threaten humans. Here are some things I’ve seen be convincing.
People seem to often think “okay fine it might happen soon, but surely AI smarter than us still won’t have free will and make its own goals”. From there you could point out that it needs goals to be useful, and if it misunderstands those goals even slightly, it might be bad. Russell’s “you can’t fetch the coffee if you’re dead” is my favorite intuitive explanation of instrumental convergence creating unexpected consequences. This requires explaining that we wouldn’t screw it up in quite such an obvious way, but the metaphor goes pretty deep into more subtle complexities of goals and logic.
The other big points, in my observation, are “people screw up complex projects a lot, especially on the first try” and “you’d probably think it was dangerous if advanced aliens were landing, right?”. One final intuitive point to make is that even if they do always correctly follow human instructions, some human will accidentally or deliberately give them very bad instructions.