Otto Barten is director of the Existential Risk Observatory, a nonprofit aiming to reduce existential risk by informing the public debate.
Joep Meindertsma is founder of PauseAI, a movement campaigning for an AI Pause.
The existential risks posed by artificial intelligence (AI) are now widely recognized.
After hundreds of industry and science leaders warned that “mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war,” the U.N. Secretary-General recently echoed their concern. So did the prime minister of the U.K., who is also investing 100 million pounds into AI safety research that is mostly meant to prevent existential risk. Other leaders are likely to follow in recognizing AI’s ultimate threat.
In the scientific field of existential risk, which studies the most likely causes of human extinction, AI is consistently ranked at the top of the list. In The Precipice, a book by Oxford existential risk researcher Toby Ord that aims to quantify human extinction risks, the likeliness of AI leading to human extinction exceeds that of climate change, pandemics, asteroid strikes, supervolcanoes, and nuclear war combined. One would expect that even for severe global problems, the risk that they lead to full human extinction is relatively small, and this is indeed true for most of the above risks. AI, however, may cause human extinction if only a few conditions are met. Among them is human-level AI, defined as an AI that can perform a broad range of cognitive tasks at least as well as we can. Studies outlining these ideas were previously known, but new AI breakthroughs have underlined their urgency: AI may be getting close to human level already.
Recursive self-improvement is one of the reasons why existential-risk academics think human-level AI is so dangerous. Because human-level AI could do almost all tasks at our level, and since doing AI research is one of those tasks, advanced AI should therefore be able to improve the state of AI. Constantly improving AI would create a positive feedback loop with no scientifically established limits: an intelligence explosion. The endpoint of this intelligence explosion could be a superintelligence: a godlike AI that outsmarts us the way humans often outsmart insects. We would be no match for it.
A godlike, superintelligent AI
A superintelligent AI could therefore likely execute any goal it is given. Such a goal would be initially introduced by humans, but might come from a malicious actor, or not have been thought through carefully, or might get corrupted during training or deployment. If the resulting goal conflicts with what is in the best interest of humanity, a superintelligence would aim to execute it regardless. To do so, it could first hack large parts of the internet and then use any hardware connected to it. Or it could use its intelligence to construct narratives that are extremely convincing to us. Combined with hacked access to our social media timelines, it could create a fake reality on a massive scale. As Yuval Harari recently put it: “If we are not careful, we might be trapped behind a curtain of illusions, which we could not tear away—or even realise is there.” As a third option, after either legally making money or hacking our financial system, a superintelligence could simply pay us to perform any actions it needs from us. And these are just some of the strategies a superintelligent AI could use in order to achieve its goals. There are likely many more. Like playing chess against grandmaster Magnus Carlsen, we cannot predict the moves he will play, but we can predict the outcome: we lose.
But why would a superintelligence want to make humanity extinct? In fact, it might not explicitly want that at all. Humanity’s extinction might be a mere side effect of executing another goal to its extremity. When we are executing our own goals as humans, such as food production or growing our economy, we tend to not pay much attention to the effects this has on other species. This routinely leads to the extinction of animal species as a mere side effect of us maximising our goals. In a similar fashion, an uncontrollable superintelligence could lead to our extinction as a side effect of the AI pushing its goals, whatever they may be, to their limits.
So how can we avoid AI existential risk? Humanity now faces a stark choice: pause AI at a safe level, or continue development indefinitely. Unsurprisingly, the AI labs want the latter. To continue, however, means literally betting our lives on these labs being able to solve an open scientific problem called AI Alignment.
AI Alignment is the approach that leading labs such as OpenAI, DeepMind, and Anthropic are currently taking to prevent human extinction. AI Alignment does not attempt to control how powerful an AI gets, does not attempt to control what exactly the AI will be doing, and does not even necessarily attempt to prevent a potential takeover from happening. Since the labs anticipate they will not be able to control the superintelligent AI that they are creating, they accept that such an AI will act autonomously and unstoppably, but aim to make it act according to our values. As Richard Ngo, a leading alignment researcher working at OpenAI, puts it, “I doubt we’ll be able to make every single deployment environment secure against the full intellectual powers of a superintelligent AGI.” There are large problems with their alignment approach, however, both fundamentally and technically.
What values should an AI have?
Fundamentally, there is of course no agreement on what values we have. What OpenAI founder Sam Altman wants a superintelligence to do may well be completely different from what, say, a textile worker in Bangladesh wants it to do. Aggregating human preferences has always been a thorny issue even before AI, but will get even more complicated when we have a superintelligence around that might be able to quickly, completely, and globally implement the wishes of a handful of CEOs. Democracy, however imperfect, has been the best solution we can find for value aggregation so far. It is, however, unclear whether our current combination of national political systems can control a godlike AI. In fact, it is unclear whether any organisation or system would be up to the task.
Beyond disagreement over current values, humans have historically not been very good at predicting the future externalities of new technologies. The climate crisis, for one, can be seen as an unforeseen outcome of technologies such as steam and internal combustion engines. Ever more powerful technology has the potential to create ever larger externalities. It is impossible to foresee what the negative side effects will be of actually implementing some interpretation of humanity’s current values. They might make climate change pale in comparison.
There are more unsolved fundamental issues with aligning a superintelligence, but currently, AI companies don’t even get that far. It seems that even making current large neural networks reliably do what anyone wants is a technical problem that we currently cannot solve. This is called inner misalignment and has been observed already in current AI: the model pursues the wrong goal when it is released into the real world after being trained on a limited dataset. For a superintelligence, this could have a catastrophic outcome. This problem is, according to OpenAI’s recent document Governance of Superintelligence, an “open research question.” But not to worry: they say they are “putting a lot of effort” into it.
Even if individuals such as Altman care about the existential threats of superintelligence, the current market dynamics of competing AI labs do not incentivize safety. The GPT-4 model built by OpenAI was tested and fine-tuned for seven months before being made public with a detailed report about its safety. Contrary to the safety efforts of OpenAI, a panicked Google released its competing PaLM 2 model just months later, without any mention of similar safety tests. Now even open-source models like Orca are performing at a ChatGPT level, without any safety analysis. The race is frantic, pushing these companies to take more risks to stay ahead. The scope of these risks is gigantic. As Jaan Tallinn, investor of the major AI lab Anthropic, puts it: “I’ve not met anyone in AI labs who says the risk [from training a next-generation model] is less than 1% of blowing up the planet”.
Let us be clear: continuing risky AI development, and eventually undergoing the intelligence explosion that leads to an uncontrollable superintelligence, is a terrible idea. We will not be able to navigate it safely. We lack the technical solutions, the coordination, the insight, the wisdom, and the experience to do something this complex right on the first try. As humans, we are good at solving problems with trial-and-error processes where we get lots of tries. But in the case of an intelligence explosion, we cannot use this strategy, since the first trial of uncontrolled superintelligence could be disastrous.
Why we should pause AI development
So what is the alternative? An AI Pause. This would mean that governments prohibit training any AI model with capabilities likely to exceed GPT-4. This can be determined by examining the amount of computation required for training, corrected over time for improvements in algorithmic efficiency. These leading models can only be trained at large AI companies or data centers, making the enforcement of such a pause realistic.
Since the leading labs are currently located in the U.S. and the U.K., implementing a pause at a national level in those two countries can be done on very short notice. In the medium term, the competitive dynamics between nations will require international legislation and collaboration, where an international AI agency would help. The U.K. is already planning to host an AI safety summit in autumn, which is a fitting moment to sign a treaty that implements an international AI Pause. Excluding important countries, especially China, from such a summit might not be wise, since they will be needed as well in the medium term. For the longer term, hardware regulation—such as making sure new consumer hardware cannot be used for large-scale AI training, while closely monitoring all hardware that can be—is a promising option to practically enforce a pause. Research into this topic needs to be scaled up rapidly to arrive at robust and implementable solutions.
Further AI development may continue only after regulating authorities are convinced that training and deploying a certain AI model will not risk an uncontrolled intelligence explosion or a takeover scenario. Second-order effects, such as making hard-to-control open-source development easier, should be taken into account in such decisions.
Continuing beyond the red line, and thus risking an intelligence explosion or a takeover, should only be allowed after meeting the strictest of requirements. First of all, the technical alignment problem should be decisively solved. Other major issues, such as aggregating humanity’s different value systems, limiting unforeseen externalities, and many more, will also need to be addressed.
A pause may be seen by some as unrealistic. They might say that we cannot stop technological progress for a speculative scenario, that market forces are too strong to counter, or that opposing countries can never coordinate. But bear in mind that as a society, we have barely begun to let the truth sink in: we are risking human extinction here. AI existential risk is still not taken as seriously as it should be. But that is changing. Leading AI academics, such as ‘fathers of AI’ Geoffrey Hinton and Yoshua Bengio, have sounded public warnings about AI’s existential risk, because they saw the technology getting more capable. This process is unlikely to stop: AI’s capabilities will grow further, and this will lead to more and more debate leaders sounding the alarm bells. When the full weight of our situation sinks in, measures that may appear unrealistic at present, could rapidly gain support.
As humanity, we will realise the situation that we are in. We will come together, and we will find a solution, as we always have. And, sooner than we think, we will implement an AI Pause.
[Crosspost] An AI Pause Is Humanity’s Best Bet For Preventing Extinction (TIME)
Link post
Otto Barten is director of the Existential Risk Observatory, a nonprofit aiming to reduce existential risk by informing the public debate.
Joep Meindertsma is founder of PauseAI, a movement campaigning for an AI Pause.
The existential risks posed by artificial intelligence (AI) are now widely recognized.
After hundreds of industry and science leaders warned that “mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war,” the U.N. Secretary-General recently echoed their concern. So did the prime minister of the U.K., who is also investing 100 million pounds into AI safety research that is mostly meant to prevent existential risk. Other leaders are likely to follow in recognizing AI’s ultimate threat.
In the scientific field of existential risk, which studies the most likely causes of human extinction, AI is consistently ranked at the top of the list. In The Precipice, a book by Oxford existential risk researcher Toby Ord that aims to quantify human extinction risks, the likeliness of AI leading to human extinction exceeds that of climate change, pandemics, asteroid strikes, supervolcanoes, and nuclear war combined. One would expect that even for severe global problems, the risk that they lead to full human extinction is relatively small, and this is indeed true for most of the above risks. AI, however, may cause human extinction if only a few conditions are met. Among them is human-level AI, defined as an AI that can perform a broad range of cognitive tasks at least as well as we can. Studies outlining these ideas were previously known, but new AI breakthroughs have underlined their urgency: AI may be getting close to human level already.
Recursive self-improvement is one of the reasons why existential-risk academics think human-level AI is so dangerous. Because human-level AI could do almost all tasks at our level, and since doing AI research is one of those tasks, advanced AI should therefore be able to improve the state of AI. Constantly improving AI would create a positive feedback loop with no scientifically established limits: an intelligence explosion. The endpoint of this intelligence explosion could be a superintelligence: a godlike AI that outsmarts us the way humans often outsmart insects. We would be no match for it.
A godlike, superintelligent AI
A superintelligent AI could therefore likely execute any goal it is given. Such a goal would be initially introduced by humans, but might come from a malicious actor, or not have been thought through carefully, or might get corrupted during training or deployment. If the resulting goal conflicts with what is in the best interest of humanity, a superintelligence would aim to execute it regardless. To do so, it could first hack large parts of the internet and then use any hardware connected to it. Or it could use its intelligence to construct narratives that are extremely convincing to us. Combined with hacked access to our social media timelines, it could create a fake reality on a massive scale. As Yuval Harari recently put it: “If we are not careful, we might be trapped behind a curtain of illusions, which we could not tear away—or even realise is there.” As a third option, after either legally making money or hacking our financial system, a superintelligence could simply pay us to perform any actions it needs from us. And these are just some of the strategies a superintelligent AI could use in order to achieve its goals. There are likely many more. Like playing chess against grandmaster Magnus Carlsen, we cannot predict the moves he will play, but we can predict the outcome: we lose.
But why would a superintelligence want to make humanity extinct? In fact, it might not explicitly want that at all. Humanity’s extinction might be a mere side effect of executing another goal to its extremity. When we are executing our own goals as humans, such as food production or growing our economy, we tend to not pay much attention to the effects this has on other species. This routinely leads to the extinction of animal species as a mere side effect of us maximising our goals. In a similar fashion, an uncontrollable superintelligence could lead to our extinction as a side effect of the AI pushing its goals, whatever they may be, to their limits.
So how can we avoid AI existential risk? Humanity now faces a stark choice: pause AI at a safe level, or continue development indefinitely. Unsurprisingly, the AI labs want the latter. To continue, however, means literally betting our lives on these labs being able to solve an open scientific problem called AI Alignment.
AI Alignment is the approach that leading labs such as OpenAI, DeepMind, and Anthropic are currently taking to prevent human extinction. AI Alignment does not attempt to control how powerful an AI gets, does not attempt to control what exactly the AI will be doing, and does not even necessarily attempt to prevent a potential takeover from happening. Since the labs anticipate they will not be able to control the superintelligent AI that they are creating, they accept that such an AI will act autonomously and unstoppably, but aim to make it act according to our values. As Richard Ngo, a leading alignment researcher working at OpenAI, puts it, “I doubt we’ll be able to make every single deployment environment secure against the full intellectual powers of a superintelligent AGI.” There are large problems with their alignment approach, however, both fundamentally and technically.
What values should an AI have?
Fundamentally, there is of course no agreement on what values we have. What OpenAI founder Sam Altman wants a superintelligence to do may well be completely different from what, say, a textile worker in Bangladesh wants it to do. Aggregating human preferences has always been a thorny issue even before AI, but will get even more complicated when we have a superintelligence around that might be able to quickly, completely, and globally implement the wishes of a handful of CEOs. Democracy, however imperfect, has been the best solution we can find for value aggregation so far. It is, however, unclear whether our current combination of national political systems can control a godlike AI. In fact, it is unclear whether any organisation or system would be up to the task.
Beyond disagreement over current values, humans have historically not been very good at predicting the future externalities of new technologies. The climate crisis, for one, can be seen as an unforeseen outcome of technologies such as steam and internal combustion engines. Ever more powerful technology has the potential to create ever larger externalities. It is impossible to foresee what the negative side effects will be of actually implementing some interpretation of humanity’s current values. They might make climate change pale in comparison.
There are more unsolved fundamental issues with aligning a superintelligence, but currently, AI companies don’t even get that far. It seems that even making current large neural networks reliably do what anyone wants is a technical problem that we currently cannot solve. This is called inner misalignment and has been observed already in current AI: the model pursues the wrong goal when it is released into the real world after being trained on a limited dataset. For a superintelligence, this could have a catastrophic outcome. This problem is, according to OpenAI’s recent document Governance of Superintelligence, an “open research question.” But not to worry: they say they are “putting a lot of effort” into it.
Even if individuals such as Altman care about the existential threats of superintelligence, the current market dynamics of competing AI labs do not incentivize safety. The GPT-4 model built by OpenAI was tested and fine-tuned for seven months before being made public with a detailed report about its safety. Contrary to the safety efforts of OpenAI, a panicked Google released its competing PaLM 2 model just months later, without any mention of similar safety tests. Now even open-source models like Orca are performing at a ChatGPT level, without any safety analysis. The race is frantic, pushing these companies to take more risks to stay ahead. The scope of these risks is gigantic. As Jaan Tallinn, investor of the major AI lab Anthropic, puts it: “I’ve not met anyone in AI labs who says the risk [from training a next-generation model] is less than 1% of blowing up the planet”.
Let us be clear: continuing risky AI development, and eventually undergoing the intelligence explosion that leads to an uncontrollable superintelligence, is a terrible idea. We will not be able to navigate it safely. We lack the technical solutions, the coordination, the insight, the wisdom, and the experience to do something this complex right on the first try. As humans, we are good at solving problems with trial-and-error processes where we get lots of tries. But in the case of an intelligence explosion, we cannot use this strategy, since the first trial of uncontrolled superintelligence could be disastrous.
Why we should pause AI development
So what is the alternative? An AI Pause. This would mean that governments prohibit training any AI model with capabilities likely to exceed GPT-4. This can be determined by examining the amount of computation required for training, corrected over time for improvements in algorithmic efficiency. These leading models can only be trained at large AI companies or data centers, making the enforcement of such a pause realistic.
Since the leading labs are currently located in the U.S. and the U.K., implementing a pause at a national level in those two countries can be done on very short notice. In the medium term, the competitive dynamics between nations will require international legislation and collaboration, where an international AI agency would help. The U.K. is already planning to host an AI safety summit in autumn, which is a fitting moment to sign a treaty that implements an international AI Pause. Excluding important countries, especially China, from such a summit might not be wise, since they will be needed as well in the medium term. For the longer term, hardware regulation—such as making sure new consumer hardware cannot be used for large-scale AI training, while closely monitoring all hardware that can be—is a promising option to practically enforce a pause. Research into this topic needs to be scaled up rapidly to arrive at robust and implementable solutions.
Further AI development may continue only after regulating authorities are convinced that training and deploying a certain AI model will not risk an uncontrolled intelligence explosion or a takeover scenario. Second-order effects, such as making hard-to-control open-source development easier, should be taken into account in such decisions.
Continuing beyond the red line, and thus risking an intelligence explosion or a takeover, should only be allowed after meeting the strictest of requirements. First of all, the technical alignment problem should be decisively solved. Other major issues, such as aggregating humanity’s different value systems, limiting unforeseen externalities, and many more, will also need to be addressed.
A pause may be seen by some as unrealistic. They might say that we cannot stop technological progress for a speculative scenario, that market forces are too strong to counter, or that opposing countries can never coordinate. But bear in mind that as a society, we have barely begun to let the truth sink in: we are risking human extinction here. AI existential risk is still not taken as seriously as it should be. But that is changing. Leading AI academics, such as ‘fathers of AI’ Geoffrey Hinton and Yoshua Bengio, have sounded public warnings about AI’s existential risk, because they saw the technology getting more capable. This process is unlikely to stop: AI’s capabilities will grow further, and this will lead to more and more debate leaders sounding the alarm bells. When the full weight of our situation sinks in, measures that may appear unrealistic at present, could rapidly gain support.
As humanity, we will realise the situation that we are in. We will come together, and we will find a solution, as we always have. And, sooner than we think, we will implement an AI Pause.