How reasonable is taking extinction risk?

The people that make general artificial intelligence models, believe that these models could mean the end of mankind. As an example of how that could happen, a future version of ChatGPT might be so smart that the advanced chatbot could create enormous danger. It could disseminate information like how to easily build weapons of mass destruction or even just build those weapons itself. For instance, after ChatGPT was first made, many people used it as the basis for agents that could roam the internet by themselves. One, called ChaosGPT, was tasked with creating a plan for taking over the world. The idea was funny, but would have been less funny if the evil machine was advanced enough to actually do it. After AI surpasses human intelligence, one of those smarter-than-human AIs could be given the task to wipe out humanity and be smart enough to complete the task. Assessing the likelihood of this scenario can be done by looking at its three steps. Since (1) AI is likely to surpass human intelligence at some point, (2) some people might give the task of destroying humanity to a superintelligent machine (there is precedent) and (3) humanity has caused the extinction of many less intelligent species without even trying to, the scenario seems at least possible. This scenario is only one of many in which the arrival of superintelligent machines spells the end of humanity.

Should we let the people building ever smarter AI models continue, if they are thereby risking human extinction? And more broadly: is taking a risk of extinction ever reasonable?

If your answer is ‘no,’ then you can stop reading this article. All I have to tell you is that the leaders of OpenAI (the maker of ChatGPT), Anthropic (its main competitor) and Google Deepmind (Googles AI lab) as well as the top three most cited AI scientists have all publicly stated that AI models have a chance of causing human extinction.

If your answer is instead ‘yes, we can let them continue building these models,’ then you are open to arguments for risking human extinction for some benefit. You might have said that it was okay to test the first nuclear bomb when it was still unclear whether the bomb would set the atmosphere on fire. Or that it was okay to build the first particle accelerator when it was still unclear whether it would create a black hole. And in the future you might be convinced time and time again, that risk of human extinction is acceptable, because you are ‘open to reason’.

But if we risk human extinction time and time again, the risk adds up and we end up extinct with near certainty. So, at some point you have to say ‘No, from now on, human extinction can not be risked’ or it is near guaranteed to happen. Should you now say: ‘Okay, we can take a risk now but at some point in the future, we have to put a complete halt to this’, then you are open to postponing drawing a line in the sand. That means that you can keep getting convinced by those wanting to risk extinction, probably not always but time and time again, that a line can be drawn in the future rather than now. But if we allow the line to be pushed further and further back indefinitely, again, humanity is practically guaranteed to go extinct. So the point from which we no longer accept extinction risk cannot be in the future either and therefore has to be now.

With estimates of odds of human extinction in our lifetimes non-trivial, this is no mere intellectual exercise. This concerns us and our loved ones. Benefits, like promises of economic growth or curing of diseases, are meaningless if the price is everyone on earth dying. To visualise this, imagine having to play Russian roulette a hundred times in a row with a million dollars of prize money every time you survive. It could be a billion for each win and the outcome would still be death. If we allow risk of human extinction, through building general artificial intelligence models or something else, we as good as guarantee human extinction at some point and it might already happen in our lifetimes.

There are however two circumstances in which taking an extinction risk can be worth it. These two circumstances have to do with other extinction risks and the fact that there are fates worse than death.

The first is one where problems like climate change and nuclear war also pose an extinction risk and AI might help mitigate those risks. It might do so by providing a blueprint for transitioning to clean energy and coming up with a way to let the great nuclear powers slowly build down their nuclear bomb supply. Should AI convincingly lower the overall risk of extinction, it is warranted to continue building greater AI models, even if it brings its own risks. The argument can then be changed to: ‘If increasing extinction risk is reasonable once, it will be reasonable in the future, until the risk materialises at some point. But allowing near certain extinction is not reasonable, so increasing extinction risk once is not reasonable either.’ AI mitigating other extinction risks to a degree that it cancels out its own might be a far cry, but it is worth looking into.

The second acceptable circumstance is one in which the alternative is a fate worse than human extinction. An idea that is commonplace in American AI companies is that China will continue to build AI and try to take over the world with it. One reaction to this possibility is to accept the extinction risk of building AI to stay ahead of the China and prevent a suppressive government with global power. A world like that could look like the world described in 1984 by George Orwell:

“There will be no curiosity, no enjoyment of the process of life. All competing pleasures will be destroyed. But always— do not forget this, Winston— always there will be the intoxication of power, constantly increasing and constantly growing subtler. Always, at every moment, there will be the thrill of victory, the sensation of trampling on an enemy who is helpless. If you want a picture of the future, imagine a boot stamping on a human face— forever. ”

Figuring out if AI mitigates or worsens extinction risk is an important question that needs work. So is figuring out if a fate worse than extinction is likely enough to warrant taking extinction risk to prevent it. But if AI only increases the risk of human extinction and at the same time a fate worse than extinction is not sufficiently likely, then we have to draw a line in the sand. Do we draw the line now or in the future? From the moment there is extinction risk, but rather before, we need to draw a line. We don’t know when AI models will be smart enough to pose a risk of human extinction and we cannot afford to wait and see. Because if waiting and seeing is reasonable, it is reasonable every now and then, until humanity goes extinct. But allowing human extinction with near certainty is not reasonable, so neither is waiting and seeing.