How possible is it that a misaligned, narrowly-superhuman AI is launched, fails catastrophically with casualties in the 10^4 − 10^9 range, and the [remainder of] humanity is “scared straight” and from that moment onward treats the AI technology the way we treat nuclear technology now—i.e. effectively strangles it into stagnation with regulations—or even more conservatively? From my naive perspective it is somewhat plausible politically, based on the only example of ~world-destroying technology that we have today. And this list of arguments doesn’t seem to rule out this possibility. Is there an independent argument by EY as to why this is not plausible technologically? I.e., why AIs narrow/weak enough to not be inevitably world-destroying but powerful enough to fail catastrophically are unlikely to be developed [soon enough]?
(To be clear, the above scenario is nothing like a path to victory and I’m not claiming it’s very likely. More like a tiny remaining possibility for our world to survive.)
I’m sure there are circumstances under which a “rogue AI” does something very scary, and leads to a very serious attempt to regulate AI worldwide, e.g. with coordination at the level of UN Security Council. The obvious analogy once again concerns nuclear weapons; proliferation in the 1960s led to the creation of the NNPT, the Nuclear Nonproliferation Treaty. Signatories agree that only the UNSC permanent members are allowed to have nuclear weapons, and in return the permanent members agree to help other signatories develop nonmilitary uses of nuclear power. The treaty definitely helped to curb proliferation, but it’s far from perfect. The official nuclear weapons states are surely willing to bend the rules and assist allies to obtain weapons capability, if it is strategically desirable and can be done deniably; and not every country signed the treaty and now some of those states (e.g. India, Pakistan) are nuclear weapons states.
Part of the NNPT regime is the IAEA, the International Atomic Energy Agency. These are the people who, for example, carry out inspections in Iran. Again, the system has all kinds of troubles, it’s surrounded by spy plots and counterplots, many nations would like to see Security Council reformed so the five victorious allies from World War 2 (US, UK, France, Russia, China) don’t have all the power, but still, something like this might buy a little time.
If we follow the blueprint that was adopted to fight nuclear proliferation, the five permanent members would be in charge, and they would insist that potentially dangerous AI activities in every country take place under some form of severe surveillance by an International Artificial Intelligence Agency, while promising to also share the benefits of safe AI with all nations. Despite all the foreseeable problems, something like this could buy time, but all the big powers would undoubtedly keep pursuing AI, in secret government programs or in open collaborations with civilian industry and academia.
The important difference is that the nuclear weapons are destructive because they worked exactly as intended, and the AI in this scenario is destructive because it failed horrendously. Plus, the concept of rogue AI has been firmly ingrained into public consciousness by now, afaik not the case with the extremely destructive weapons in 1940s [1]. So hopefully this will produce more public outrage (and scare among the elites themselves) ⇒ stricter external and internal limitations on all agents developing AIs. But in the end I agree, it’ll only buy time, maybe few decades if we are lucky, to solve the problem properly or to build more sane political institutions.
Yes I’m sure there was a scifi novel or two before 1945 describing bombs of immense power. But I don’t think it was anywhere nearly as widely known as Matrix or Terminator.
I’m interested in getting predictions for whether such an event would get all (known) labs to stop research for even one month (not counting things like “the internet is down so we literally can’t continue”).
It might, given some luck and that all the pro-safety actors play their cards right. Assuming by “all labs” you mean “all labs developing AIs at or near to then-current limit of computational power”, or something along those lines, and by “research” you mean “practical research”, i.e. training and running models. The model I have in mind not that everyone involved will intellectually agree that such research should be stopped, but that enough percentage of public and governments will get scared and exert pressure on the labs. Consider how most of the world was able to (imperfectly) coordinate to slow Covid spread, or how nobody have prototyped a supersonic passenger jet in decades, or, again, the nuclear energy—we as a species can do such things in principle, even though often for the wrong reasons.
I’m not informed enough to give meaningful probabilities on this, but to honor the tradition, I’d say that given a catastrophe with immediate, graphic death toll >=1mln happening in or near the developed world, I’d estimate >75% probability that ~all seriously dangerous activity will be stopped for at least a month, and >50% that it’ll be stopped for at least a year. With the caveat that the catastrophe was unambiguously attributed to the AI, think “Fukushima was a nuclear explosion”, not “Covid maybe sorta kinda plausibly escaped from the lab but well who knows”.
I’d be pretty happy to bet on this and then keep discussing it, wdyt? :)
Here are my suggested terms:
All major AI research labs that we know about (deep mind, openai, facebook research, china, perhaps a few more*)
Stop “research that would advance AGI” for 1 month, defined not as “practical research” but as “research that will be useful for AGI coming sooner”. So for example if they stopped only half of their “useful to AGI” research, but they did it for 3 months, you win. If they stopped training models but keep doing the stuff that is the 90% bottleneck (which some might call “theoretical”), I win
*You judge all these parameters yourself however you feel like
I’m just assuming you agree that the labs mentioned above are currently going towards AGI, at least for the purposes of this bet. If you believe something like “openai (and the other labs) didn’t change anything about their research but hey, they weren’t doing any relevant research in the first place”, then say so now
I might try to convince you to change your mind, or ask others to comment here, but you have the final say
Regarding “the catastrophe was unambiguously attributed to the AI”—I ask that you judge if it was unambiguously because AI, and that you don’t rely on public discourse, since the public can’t seem to unambiguously agree on anything (like even vaccines being useful).
To start off, I don’t see much point in formally betting $20 on an event conditioned on something I assign <<50% probability of happening within the next 30 years (powerful AI is launched and failed catastrophically and we’re both still alive to settle the bet and there was an unambiguous attribution of the failure to the AI). I mean sure, I can accept the bet, but largely because I don’t believe it matters one way or another, so I don’t think it counts from the epistemological virtue standpoint.
But I can state what I’d disagree with in your terms if I were to take it seriously, just to clarify my argument:
Sounds good.
Mostly sounds good, but I’d push back that “not actually running anything close to the dangerous limit” sounds like a win to me, even if theoretical research continues. One pretty straightforward Schelling point for a ban/moratorium on AGI research is “never train or run anything > X parameters”, with X << dangerous level at then-current paradigm. It may be easier explain to the public and politicians than many other potential limits, and this is important. It’s much easier to control too—checking that nobody collects and uses a gigashitton of GPUs [without supervision] is easier than to check every researcher’s laptop. Additionally, we’ll have nuclear weapons tests as a precedent.
That’s the core of my argument, really. If the consortium of 200 world experts says “this happened because your AI wasn’t aligned, let’s stop all AI research”, then Facebook AI or China can tell the consortium to go fuck themselves, and I agree with your skepticism that it’d make all labs pause for even a month (see: gain of function research, covid). But if it becomes public knowledge that a catastrophe of 1mln casualties happened because of AI, then it can trigger a panic which will make both the world leaders and the public to really honestly want to restrict this AI stuff, and it will both justify and enable the draconian measures required to make every lab to actually stop the research. Similar to how panics about nuclear energy, terrorism and covid worked. I propose defining “public agreement” as “leaders of the relevant countries (defined as the countries housing the labs from p.1, so US, China, maybe UK and a couple of others) each issue a clear public statement saying that the catastrophe happened because of an unaligned AI”. This is not an unreasonable ask, they were this unanimous about quite a few things, including vaccines.
How possible is it that a misaligned, narrowly-superhuman AI is launched, fails catastrophically with casualties in the 10^4 − 10^9 range, and the [remainder of] humanity is “scared straight” and from that moment onward treats the AI technology the way we treat nuclear technology now—i.e. effectively strangles it into stagnation with regulations—or even more conservatively? From my naive perspective it is somewhat plausible politically, based on the only example of ~world-destroying technology that we have today. And this list of arguments doesn’t seem to rule out this possibility. Is there an independent argument by EY as to why this is not plausible technologically? I.e., why AIs narrow/weak enough to not be inevitably world-destroying but powerful enough to fail catastrophically are unlikely to be developed [soon enough]?
(To be clear, the above scenario is nothing like a path to victory and I’m not claiming it’s very likely. More like a tiny remaining possibility for our world to survive.)
I’m sure there are circumstances under which a “rogue AI” does something very scary, and leads to a very serious attempt to regulate AI worldwide, e.g. with coordination at the level of UN Security Council. The obvious analogy once again concerns nuclear weapons; proliferation in the 1960s led to the creation of the NNPT, the Nuclear Nonproliferation Treaty. Signatories agree that only the UNSC permanent members are allowed to have nuclear weapons, and in return the permanent members agree to help other signatories develop nonmilitary uses of nuclear power. The treaty definitely helped to curb proliferation, but it’s far from perfect. The official nuclear weapons states are surely willing to bend the rules and assist allies to obtain weapons capability, if it is strategically desirable and can be done deniably; and not every country signed the treaty and now some of those states (e.g. India, Pakistan) are nuclear weapons states.
Part of the NNPT regime is the IAEA, the International Atomic Energy Agency. These are the people who, for example, carry out inspections in Iran. Again, the system has all kinds of troubles, it’s surrounded by spy plots and counterplots, many nations would like to see Security Council reformed so the five victorious allies from World War 2 (US, UK, France, Russia, China) don’t have all the power, but still, something like this might buy a little time.
If we follow the blueprint that was adopted to fight nuclear proliferation, the five permanent members would be in charge, and they would insist that potentially dangerous AI activities in every country take place under some form of severe surveillance by an International Artificial Intelligence Agency, while promising to also share the benefits of safe AI with all nations. Despite all the foreseeable problems, something like this could buy time, but all the big powers would undoubtedly keep pursuing AI, in secret government programs or in open collaborations with civilian industry and academia.
The important difference is that the nuclear weapons are destructive because they worked exactly as intended, and the AI in this scenario is destructive because it failed horrendously. Plus, the concept of rogue AI has been firmly ingrained into public consciousness by now, afaik not the case with the extremely destructive weapons in 1940s [1]. So hopefully this will produce more public outrage (and scare among the elites themselves) ⇒ stricter external and internal limitations on all agents developing AIs. But in the end I agree, it’ll only buy time, maybe few decades if we are lucky, to solve the problem properly or to build more sane political institutions.
Yes I’m sure there was a scifi novel or two before 1945 describing bombs of immense power. But I don’t think it was anywhere nearly as widely known as Matrix or Terminator.
I’m interested in getting predictions for whether such an event would get all (known) labs to stop research for even one month (not counting things like “the internet is down so we literally can’t continue”).
I expect it won’t. You?
It might, given some luck and that all the pro-safety actors play their cards right. Assuming by “all labs” you mean “all labs developing AIs at or near to then-current limit of computational power”, or something along those lines, and by “research” you mean “practical research”, i.e. training and running models. The model I have in mind not that everyone involved will intellectually agree that such research should be stopped, but that enough percentage of public and governments will get scared and exert pressure on the labs. Consider how most of the world was able to (imperfectly) coordinate to slow Covid spread, or how nobody have prototyped a supersonic passenger jet in decades, or, again, the nuclear energy—we as a species can do such things in principle, even though often for the wrong reasons.
I’m not informed enough to give meaningful probabilities on this, but to honor the tradition, I’d say that given a catastrophe with immediate, graphic death toll >=1mln happening in or near the developed world, I’d estimate >75% probability that ~all seriously dangerous activity will be stopped for at least a month, and >50% that it’ll be stopped for at least a year. With the caveat that the catastrophe was unambiguously attributed to the AI, think “Fukushima was a nuclear explosion”, not “Covid maybe sorta kinda plausibly escaped from the lab but well who knows”.
I’d be pretty happy to bet on this and then keep discussing it, wdyt? :)
Here are my suggested terms:
All major AI research labs that we know about (deep mind, openai, facebook research, china, perhaps a few more*)
Stop “research that would advance AGI” for 1 month, defined not as “practical research” but as “research that will be useful for AGI coming sooner”. So for example if they stopped only half of their “useful to AGI” research, but they did it for 3 months, you win. If they stopped training models but keep doing the stuff that is the 90% bottleneck (which some might call “theoretical”), I win
*You judge all these parameters yourself however you feel like
I’m just assuming you agree that the labs mentioned above are currently going towards AGI, at least for the purposes of this bet. If you believe something like “openai (and the other labs) didn’t change anything about their research but hey, they weren’t doing any relevant research in the first place”, then say so now
I might try to convince you to change your mind, or ask others to comment here, but you have the final say
Regarding “the catastrophe was unambiguously attributed to the AI”—I ask that you judge if it was unambiguously because AI, and that you don’t rely on public discourse, since the public can’t seem to unambiguously agree on anything (like even vaccines being useful).
I suggest we bet $20 or so mainly “for fun”
What do you think?
To start off, I don’t see much point in formally betting $20 on an event conditioned on something I assign <<50% probability of happening within the next 30 years (powerful AI is launched and failed catastrophically and we’re both still alive to settle the bet and there was an unambiguous attribution of the failure to the AI). I mean sure, I can accept the bet, but largely because I don’t believe it matters one way or another, so I don’t think it counts from the epistemological virtue standpoint.
But I can state what I’d disagree with in your terms if I were to take it seriously, just to clarify my argument:
Sounds good.
Mostly sounds good, but I’d push back that “not actually running anything close to the dangerous limit” sounds like a win to me, even if theoretical research continues. One pretty straightforward Schelling point for a ban/moratorium on AGI research is “never train or run anything > X parameters”, with X << dangerous level at then-current paradigm. It may be easier explain to the public and politicians than many other potential limits, and this is important. It’s much easier to control too—checking that nobody collects and uses a gigashitton of GPUs [without supervision] is easier than to check every researcher’s laptop. Additionally, we’ll have nuclear weapons tests as a precedent.
That’s the core of my argument, really. If the consortium of 200 world experts says “this happened because your AI wasn’t aligned, let’s stop all AI research”, then Facebook AI or China can tell the consortium to go fuck themselves, and I agree with your skepticism that it’d make all labs pause for even a month (see: gain of function research, covid). But if it becomes public knowledge that a catastrophe of 1mln casualties happened because of AI, then it can trigger a panic which will make both the world leaders and the public to really honestly want to restrict this AI stuff, and it will both justify and enable the draconian measures required to make every lab to actually stop the research. Similar to how panics about nuclear energy, terrorism and covid worked. I propose defining “public agreement” as “leaders of the relevant countries (defined as the countries housing the labs from p.1, so US, China, maybe UK and a couple of others) each issue a clear public statement saying that the catastrophe happened because of an unaligned AI”. This is not an unreasonable ask, they were this unanimous about quite a few things, including vaccines.