Humanity isn’t remotely longtermist, so arguments for AGI x-risk should focus on the near term
Toby Ord recently published a nice piece On the Value of Advancing Progress about mathematical projections of far-future outcomes given different rates of progress and risk levels. The problem with that and many arguments for caution is that people usually barely care about possibilities even twenty years out.
We could talk about sharp discounting curves in decision-making studies, and how that makes sense given evolutionary pressures in tribal environments. But I think this is pretty obvious from talking to people and watching our political and economic practices.
Utilitarianism is a nicely self-consistent value system. Utilitarianism pretty clearly implies longtermism. Most people don’t care that much about logical consistency,[1] so they are happily non-utilitarian and non-longtermist in a variety of ways. Many arguments for AGI safety are longtermist, or at least long-term, so they’re not going to work well for most of humanity.
This is a fairly obvious, but worth-keeping-in-mind point.
One non-obvious lemma of this observation is that much skepticism about AGI x-risk is probably based on skepticism about AGI happening soon. This doesn’t explain all skepticism, but it’s a significant factor worth addressing. When people dig into their logic, that’s often a central point. They start out saying “AGI wouldn’t kill humans” then over the course of a conversation it turns out that they feel that way primarily because they don’t think real AGI will happen in their lifetimes. Any discussion of AGI x-risks isn’t productive, because they just don’t care about it.
The obvious counterpoint is “You’re pretty sure it won’t happen soon? I didn’t know you were an expert in AI or cognition!” Please don’t say this—nothing convinces your opponents to cling to their positions beyond all logic like calling them stupid.[2] Something like “well, a lot of people with the most relevant expertise think it will happen pretty soon. A bunch more think it will take longer. So I just assume I don’t know which is right, and it might very well happen pretty soon”.
It looks to me like discussing whether AGI might threaten humans is pretty pointless if the person is still assuming it’s not going to happen for a long time. Once you’re past that, it might make sense to actually talk about why you think AGI would be risky for humans.[3]
- ^
This is an aside, but you’ll probably find that utilitarianism isn’t that much more logical than other value systems anyway. Preferring what your brain wants you to prefer, while avoiding drastic inconsistency, has practical advantages over values that are more consistent but that clash with your felt emotions. So let’s not assume humanity isn’t utilitarian just because it’s stupid.
- ^
Making sure any discussions you have about x-risk are pleasant for all involved is probably actually the most important strategy. I strongly suspect that personal affinity weighs more heavily than logic on average, even for fairly intellectual people. (Rationalists are a special case; I think we’re resistant but not immune to motivated reasoning).
So making a few points in a pleasant way, then moving on to other topics they like is probably way better than making the perfect logical argument while even slightly irritating them.
- ^
From there you might be having the actual discussion on why AGI might threaten humans. Here are some things I’ve seen be convincing.
People seem to often think “okay fine it might happen soon, but surely AI smarter than us still won’t have free will and make its own goals”. From there you could point out that it needs goals to be useful, and if it misunderstands those goals even slightly, it might be bad. Russell’s “you can’t fetch the coffee if you’re dead” is my favorite intuitive explanation of instrumental convergence creating unexpected consequences. This requires explaining that we wouldn’t screw it up in quite such an obvious way, but the metaphor goes pretty deep into more subtle complexities of goals and logic.
The other big points, in my observation, are “people screw up complex projects a lot, especially on the first try” and “you’d probably think it was dangerous if advanced aliens were landing, right?”. One final intuitive point to make is that even if they do always correctly follow human instructions, some human will accidentally or deliberately give them very bad instructions.
A similar point was made by this guy: https://forum.effectivealtruism.org/posts/KDjEogAqWNTdddF9g/long-termism-vs-existential-risk
Oh, that guy :)
I hadn’t seen that piece. I think my argument is the same, and the point is even more urgent and important now than it was two years ago. Many peoples timelines have shortened pretty dramatically since then.
The claim that you make on the relation between utilitarianism and longtermism:
Doesn’t work in the general case, and longtermism requires more assumptions than utilitarianism. A utilitarian can be a long-term agent, assuming the discount rate is 0 or very low, and/or it’s discount rate is hyperbolic, but this doesn’t work for other discounting models like exponential discounting, especially anything like say a 2-5% discount rate.
The discount rate here is a free parameter, and thus utilitarianism doesn’t always imply longtermism without additional assumptions.
That’s why I put a locally invalid react on the claim.
moral utilitarianism is a more specific thing than utility maximization, I think?
This is interesting, but it’s not important to the claims I’m making here. Most people aren’t remotely utilitarian, either.
Here’s why I said utilitarianism pretty firmly implies longtermism. If you care about all humans equally, it seems like you should logically care equally about them regardless of when they happen to live. Slapping a discount on it means you’re not actually a utilitarian by that definition of the term. You’re a short-termist-utilitarian. That’s how I’ve understood the most common definition of utilitarianism, but of course people can use the term differently.
There’s a strong argument that even if you care about everyone equally, you should do some sort of discounting over time to account for how much harder it is to predict results of current decisions on farther future utility. But that’s different than just using a uniform discount on future utility.
I think it’s worth distinguishing between what I’ll call ‘intrinsic preference discounting’, and ‘uncertain-value discounting’. In the former case, you inherently care less about what happens in the (far?) future; in the latter case you are impartial but rationally discount future value based on your uncertainty about whether it’ll actually happen—perhaps there’ll be a supernova or something before anyone actually enjoys the utils! Economists often observe the latter, or some mixture, and attribute it to the former.
Agreed, that’s exactly what I was trying to get across in that last paragraph.
It seems better to ask what would people do if they had more tangible options, such that they could reach a reflective equilibrium which explicitly endorses particular tradeoffs. People mostly pick not caring about possibilities twenty years out due to not seeing how their options constrain what happens in twenty years. This points to not treating their surface preferences as central insofar as they are not following from a reflective equilibrium with knowledge about all their available options. If one knows their principal can’t get that opportunity, one has a responsibility to still act on what their principal’s preferences would point to given more of the context.
They would care more about logical consistency if they knew more about its implications.
If we’re asking people to imagine a big empty future full of vague possibility, it’s not surprising that they’re ambivalent about long-termism. Describe an actual hard-for-humans-to-conceive-of-in-the-first-place utopia and how it conditions on their coordinacy, show them the joy and depth of each life which follows, the way things like going on an adventure were taken to a transcendent level, and the preferences they already had will plausibly lead them to adopt a more long-termist stance. On the surface, people care as a function of distance from how tangible the options are.
The problem is demonstrating that good outcomes are gated by what we do, and that those good outcomes are actually really good in a way hard for modern humans to conceive.
All good points.
I agree that people will care more if their decisions clearly matter in producing that future.
This isn’t easy to apply to the AGI situation, because what actions will help which outcomes is quite unclear and vigorously argued. Serious thinkers argue for both trying to slow down (PauseAI), and for defensive acceleration (Buterin, Aschenbrenner, etc). And it’s further complicated in that many of us think that accelerating will probably produce a better world in a few years, then shortly after that, humanity is dead or sadly obsolete. This pits short-term directly against long-term concerns.
I very much agree that helping people imagine either a very good or a very bad future will cause them to care more about it. I think that’s been established pretty thoroughly in the decision-making empirical literature.
Here I’m reluctant to say more than “futures so good they’re difficult to imagine” since the my actual predictions sound like batshit-crazy scifi to most people right now. Sometimes I say things like people won’t have to work and global warming will be easy to solve; then people fret about what they’d do with their time if they didn’t have to work. I’ve also tried talking about dramatic health extension, to which people question how much longer they’d want to live any (except old people, who never do—but they’re ironically exactly the ones who probably won’t benefit from AGI-designed life extension).
That’s all specific points in agreement with your take that really good outcomes are hard for modern humans to conceive.
I agree that describing good futures is worth some more careful thinking.
One thought is that it might be easier for most folks to imagine a possible dystopian outcome, in which humans aren’t wiped out but made obsolete and simply starve to death when they can’t compete with AI wages for any job. I don’t think that’s the likeliest catastrophe, but it seems possible and might be a good point of focus.
Yeah, I’m in both camps. We should do our absolute best to slow down how quickly we approach building agents, and one way is leveraging AI that doesn’t rely on being agentic. It offers us a way to do something like global compute monitoring and could possibly also alleviate short-term incentives satisfiable by building agents, by offering a safer avenue. Insofar as a global moratorium stopping all large model research is feasible, we should probably just do that.
It feels like there’s a missing genre of slice of life stories about people living in utopias. Arguably there are some members in related genres which might be weird to use for convincing people.
The tale could have two topias, one where it was the best of times, another where it was the worst of times, the distance to either one more palpable for it, with the differences following from different decisions made at the outset, and possibly using many of the same characters. This seems like a sensible thing for somebody to do, as I can point to being personally better calibrated due to thinking along those lines.