Some people like to tell themselves that surely we’ll get an AI warning shot and that will wake people up; but this sounds to me like wishful thinking from the world where the world has a competent response to the pandemic warning shot we just got.
When I think “AI warning shots”, the warning shot I’m usually imagining involves the death of 10-50% of working-age and politically relevant people, if not from the shot itself then the social and political upheaval that happened afterwards. The “warning” in “warning shot” is “warning” that the relevant decision makers (congresspeople, etc.) die if the problem remains unsolved, not a few million miscellaneous grandmothers, whose early deaths can safely be ignored in favor of writing more blogs about the culture war. Thus this category of events generally doesn’t include flash crashes, some kind of “semi”-advanced computer worm, or an industrial accident that costs us a hundred billion dollars or an economic depression but is then resolved by some heroic engineers, unless one of those happens to instill a fear of AI systems as being personally dangerous to lawmakers.
One specific (unlikely) example, that I myself can elaborate upon in detail, and I think could come before actually existentially threatening systems, is an AI just destroying the internet by doing what regular hackers do on a smaller timetable. It would be within the capabilities of existing not-super-impressive humans, with 100 subjective person-years to burn, to find or copy a couple dozen or so different zero-day bugs ala EternalBlue, write an excellent worm that trivially evades most commercial IDS, and release it and several delayed-release backup versions with different exploits and signatures for another bag of common network services when the initial wave starts to become ineffective. An AGI that wasn’t smart enough to build nanobots could still pretty successfully shut down or commandeer an arbitrarily large number of web servers and banks and (internet-adjacent) electrical grids and factories and oil pipelines and self-driving cars all around the same time, and keep them disabled long enough to the point that humans replace them with things that don’t use computers because fighting the worms will take too long and people have begun starving. Communities of worms built this way would be capable of consistently destroying almost all important unairgapped computers in an obfuscated and hard-to-debug way, well after the incident response teams working to guard these things manages to figure out that bootkit #3 gets onto the network from IOT toaster #2 on the next office floor. This technique destroys or commandeers lots of “airgapped” subnets too because most things people claim are airgapped outside of a highly specific national security context are only “airgapped” and accidentally connect to public DNS or regularly have some system administrator walking in an “inspected” USB drive with the latest version of Debian or whatever. Maybe some makeshift bombs go off or some second-world nations’ poorly protected drones get commandeered too, but I expect not nukes, because those are actually consistently Airgapped.
If it’s extremely unlikely to happen (which like any overly detailed story is probably true), I don’t think it’s because of a reason like “if an AI can do that it can definitely build nanobots”. It’s not even that difficult to accomplish. A team of I and maybe three other people I personally know could do it right now with maybe ~50% success rate, if we had an absurd amount of prep time inside of a DragonBallZ turnstyle to do it with. The primary reason people haven’t launched the worm-nuke already isn’t because people aren’t smart enough, it’s because they don’t have the time, there’s no motivation for anyone to do anything like this, they fear arrest if they tried to gather a team to coordinate it, and most of all the territory is changing too quickly. And these things—relative speed, cooperative replicas, single-mindedness towards weird goals - are the first critical advantages over regular people I expect early intelligent systems to have. Transformers already write code at an absurd pace compared to living breathing humans, and despite the age-old protestations about how we represent a small window on the intelligence spectrum, so far SOTA models are pretty much steadily climbing across the village idiot to Einstein spectrum in lockstep with log(training cost). Assuming we solve the context window problem a cluster of replicas of DeepMind’s private “Codex” implementation will be able to do this in 5-10 years, and it’s not clear to me without further assumptions that the cluster would also therefore be capable of doing something instrumentally, existentially threatening on its first critical try.
Needless to say, we are not prepared for all computers to suddenly stop working or act destructively for extended periods of time, in the same way that we are not prepared to go back to hunting and foraging even if that could somehow in theory sustain modern populations. And if it’s somehow obvious (say, partly based on the timing of a very public release or an announcement) that DeepMind launched the thing that ended up killing 20% of the populace and/or sends us back to a 1970 standard of living, every former employee of Google’s is getting declared guilty by association, whether they’re the “ML capabilities engineers” or not. At minimum Demis Hassabis and a bunch of key figures in Google leadership get executed or permanently imprisoned, because the party(ies) directing the rioters or army during ensuing martial law rationally and predictably exploit that opportunity to mobilize a base and look competent and tough. That might happen even if there’s a 50% chance they did it because Google is not China and is not a politically inconvenient scapegoat for any major faction in western politics. And this is a sample of the class of events I suggest would have sufficiently shaken things up that coordination on the alignment problem or the delayed-AGI-problem might be possible, not some weaksauce covid-19 pseudo-emergency.
Me neither, but I wanted to outline a Really Bad, detailed, pre-nanofactory scenario, since the last few times I’ve talked to people about this they kept either underestimating its consequences or asserting without basis that it was impossible. Also see the last paragraph.
When I think “AI warning shots”, the warning shot I’m usually imagining involves the death of 10-50% of working-age and politically relevant people, if not from the shot itself then the social and political upheaval that happened afterwards. The “warning” in “warning shot” is “warning” that the relevant decision makers (congresspeople, etc.) die if the problem remains unsolved, not a few million miscellaneous grandmothers, whose early deaths can safely be ignored in favor of writing more blogs about the culture war. Thus this category of events generally doesn’t include flash crashes, some kind of “semi”-advanced computer worm, or an industrial accident that costs us a hundred billion dollars or an economic depression but is then resolved by some heroic engineers, unless one of those happens to instill a fear of AI systems as being personally dangerous to lawmakers.
One specific (unlikely) example, that I myself can elaborate upon in detail, and I think could come before actually existentially threatening systems, is an AI just destroying the internet by doing what regular hackers do on a smaller timetable. It would be within the capabilities of existing not-super-impressive humans, with 100 subjective person-years to burn, to find or copy a couple dozen or so different zero-day bugs ala EternalBlue, write an excellent worm that trivially evades most commercial IDS, and release it and several delayed-release backup versions with different exploits and signatures for another bag of common network services when the initial wave starts to become ineffective. An AGI that wasn’t smart enough to build nanobots could still pretty successfully shut down or commandeer an arbitrarily large number of web servers and banks and (internet-adjacent) electrical grids and factories and oil pipelines and self-driving cars all around the same time, and keep them disabled long enough to the point that humans replace them with things that don’t use computers because fighting the worms will take too long and people have begun starving. Communities of worms built this way would be capable of consistently destroying almost all important unairgapped computers in an obfuscated and hard-to-debug way, well after the incident response teams working to guard these things manages to figure out that bootkit #3 gets onto the network from IOT toaster #2 on the next office floor. This technique destroys or commandeers lots of “airgapped” subnets too because most things people claim are airgapped outside of a highly specific national security context are only “airgapped” and accidentally connect to public DNS or regularly have some system administrator walking in an “inspected” USB drive with the latest version of Debian or whatever. Maybe some makeshift bombs go off or some second-world nations’ poorly protected drones get commandeered too, but I expect not nukes, because those are actually consistently Airgapped.
If it’s extremely unlikely to happen (which like any overly detailed story is probably true), I don’t think it’s because of a reason like “if an AI can do that it can definitely build nanobots”. It’s not even that difficult to accomplish. A team of I and maybe three other people I personally know could do it right now with maybe ~50% success rate, if we had an absurd amount of prep time inside of a DragonBallZ turnstyle to do it with. The primary reason people haven’t launched the worm-nuke already isn’t because people aren’t smart enough, it’s because they don’t have the time, there’s no motivation for anyone to do anything like this, they fear arrest if they tried to gather a team to coordinate it, and most of all the territory is changing too quickly. And these things—relative speed, cooperative replicas, single-mindedness towards weird goals - are the first critical advantages over regular people I expect early intelligent systems to have. Transformers already write code at an absurd pace compared to living breathing humans, and despite the age-old protestations about how we represent a small window on the intelligence spectrum, so far SOTA models are pretty much steadily climbing across the village idiot to Einstein spectrum in lockstep with log(training cost). Assuming we solve the context window problem a cluster of replicas of DeepMind’s private “Codex” implementation will be able to do this in 5-10 years, and it’s not clear to me without further assumptions that the cluster would also therefore be capable of doing something instrumentally, existentially threatening on its first critical try.
Needless to say, we are not prepared for all computers to suddenly stop working or act destructively for extended periods of time, in the same way that we are not prepared to go back to hunting and foraging even if that could somehow in theory sustain modern populations. And if it’s somehow obvious (say, partly based on the timing of a very public release or an announcement) that DeepMind launched the thing that ended up killing 20% of the populace and/or sends us back to a 1970 standard of living, every former employee of Google’s is getting declared guilty by association, whether they’re the “ML capabilities engineers” or not. At minimum Demis Hassabis and a bunch of key figures in Google leadership get executed or permanently imprisoned, because the party(ies) directing the rioters or army during ensuing martial law rationally and predictably exploit that opportunity to mobilize a base and look competent and tough. That might happen even if there’s a 50% chance they did it because Google is not China and is not a politically inconvenient scapegoat for any major faction in western politics. And this is a sample of the class of events I suggest would have sufficiently shaken things up that coordination on the alignment problem or the delayed-AGI-problem might be possible, not some weaksauce covid-19 pseudo-emergency.
I agree that events of that magnitude would wake people up. I just don’t think we’ll get events of that magnitude until it’s too late.
Me neither, but I wanted to outline a Really Bad, detailed, pre-nanofactory scenario, since the last few times I’ve talked to people about this they kept either underestimating its consequences or asserting without basis that it was impossible. Also see the last paragraph.