Recently, Eric Schmidt gave a talk at Harvard called “Our AI Future: Hopes and Hurdles Ahead”. The entire talk is available here but there was one part that was interesting to me (around the 1:11:00 mark), and that is, his views on AI safety and trust in AI labs to stop scaling if recursive self-improvement starts happening. Emphasis my own.
The technical term is recursive self-improvement. As long as the system isn’t busy learning on its own, you’re okay. I’ll parrot Sam’s speech at OpenAI. [...]
In the next few years we’ll get to AGI [...] Some number of years after that, the computers are going to start talking to each other, probably in a language that we can’t understand, and collectively their superintelligence [...] is going to rise very rapidly. My retort to that is, do you know what we’re going to do in that scenario? We’re going to unplug them all.
The way you would do this if you were an evil person [...] is you would simply say to the computer: “Learn everything. Try really hard. Start right now.” So the computer starts learning. It learns about French, it learns about science, it learns about people, it learns about politics. It does all of that, and at some point, it learns about electricity and it learns that it needs electricity, and it decides it needs more. So it hacks into the hospital and takes the electricity from the hospital to give it to itself.
That’s a simple example of why this is so dangerous. Most people in the industry who thought about this believe that when you begin recursive self-improvement, there will be a very serious regulatory response because of these issues. And that makes sense to me.
So his expectation-suggestion is that we will wait until we are one breakthrough away from uncontrolled takeoff, and then after we are in the middle of uncontrolled takeoff, attempt to stop it via legislation?
This sounds reasonable, until you examine the hidden assumptions in the phrase:
”We’re going to unplug them all”:
Humans have the capability and access to physically disconnect or disable all participating computers.
Let’s call this group of humans “Controllers”. Let’s call the participating computers AI’s for brevity. Controllers need to communicate in real time on a separate network completely isolated from any influence from AIs. Ignoring the cost in treasure and political will to make this parallel comm system happen, note that it is an absolute necessity to have it in place before any recursive improvement is expected to start. No self respecting intelligence would stop itself from interfering with a communication system that is intended to shut it down, by physically shutting power stations all the way to impersonating Controllers.
Controllers will unanimously agree that unplugging is the necessary and correct response.
Maybe they do not need to unanimously agree in order for all of them to execute the global unplug decisions, if they are bound to do so by oath, conviction and training. However, this seems very analogous to humans pushing the button to launch intercontinental ballistic missiles, knowing they will do an untold amount of damage, knowing that spoofing of the command is a possibility, knowing that retaliation is to be expected. Humans are notoriously bad at unanimity, at acting in concert with 100% reliability.
How to coordinate the Controllers reliably without the help of any human level AI is an open problem.
Unplugging computers will not cause sufficient catastrophic failure in critical systems like hospitals, transportation, financial systems to dissuade Controllers
Even making them hesitate for long enough to let the AIs recurse into an unstoppable force could be sufficient for failure.
Not only do the Controllers need act unanimously, they need to do so in the face of real world suffering.
Controllers will have reliable and timely information regarding the growth in intelligence of AIs.
A few considerations that make this a facile assumption:
- intelligence is not well defined, level of intelligence is difficult to measure. It is not clear that it is possible for a human intelligence to accurately judge the intelligence of something smarter than it is, other than by simple extrapolation of computing metrics that are meaningless outside of narrow domains, e.g. number of plies in chess, number of parameters of a deep learning model.
- AIs smart enough to start recursing would be, it seems safe to assume, be able to dissemble and appear dumber than they actually are
The capabilities of all AIs in service of all governments and large tech companies will be transparently made accessible to Controllers.
This assumption is perhaps the most pollyannish of all. At its core the idea that “we” are going to unplug them all assumes a select few “we” that Eric Schmidt imagines himself or someone like himself to always be part of, not a world in which human level AIs have proliferated and have become part of and enablers of all facets of human society. It is a failure of the imagination to assume that past decision makers will not lose control over the technologies they helped bring into the world.
In the next few hours we’ll get to noticable flames [...] Some number of hours after that, the fires are going to start connecting to each other, probably in a way that we can’t understand, and collectively their heat [...] is going to rise very rapidly. My retort to that is, do you know what we’re going to do in that scenario? We’re going to unkindle them all.