Take 6: CAIS is actually Orwellian.
As a writing exercise, I’m writing an AI Alignment Hot Take Advent Calendar—one new hot take, written every day for 25 days. Or until I run out of hot takes.
CAIS, or Comprehensive AI Services, was a mammoth report by Eric Drexler from 2019. (I think reading the table of contents is a good way of getting the gist of it.) It contains a high fraction of interesting predictions and also a high fraction of totally wrong ones—sometimes overlapping!
The obvious take about CAIS is that it’s wrong when it predicts that agents will have no material advantages over non-agenty AI systems. But that’s long been done, and everyone already knows it.
What not everyone knows is that CAIS isn’t just a descriptive report about technology, it also contains prescriptive implications, and relies on predictions about human sociocultural adaptation to AI. And this future that it envisions is Orwellian.
This isn’t totally obvious. Mostly, the report is semi-technical arguments about AI capabilities. But even if you’re looking for the parts of the report about what AI capabilities people will or should develop, or even the parts that sound like predictions about the future, they sound quite tame. It envisions that humans will use superintelligent AI services in contexts where defense trumps offense, and where small actors can’t upset the status quo and start eating the galaxy.
The CAIS worldview expects us to get to such a future because humans are actively working for it—no AI developer, or person employing AI developers, wants to get disassembled by a malevolent agent, and so we’ll look for solutions that shape the future such that that’s less likely (and the technical arguments claim that such solutions are close to hand). If the resulting future looks kinda like business as usual—in terms of geopolitical power structure, level of human autonomy, maybe even superficial appearance of the economy, it’s because humans acted to make it happen because they wanted business as usual.
Setting up a defensive equilibrium where new actors can’t disrupt the system is hard work. Right now, just anyone is allowed to build an AI. This capability probably has to be eliminated for the sake of long-term stability. Ditto for people being allowed to have unfiltered interaction with existing superintelligent AIs. Moore’s law of mad science says that the IQ needed to destroy the world drops by 1 point every 18 months. In the future where that IQ is 70, potentially world-destroying actions will have to be restricted if we don’t want the world destroyed.
In short, this world where people successfully adapt to superintelligent AI services is a totalitarian police state. The people who currently have power in the status quo are the ones who are going to get access to the superintelligent AI, and they’re going to (arguendo) use it to preserve the status quo, which means just a little bit of complete surveillance and control.
Hey, at least it’s preferable to getting turned into paperclips.
These implications shouldn’t surprise you too much if you know that Eric Drexler produced this report at FHI, and remember the works of Nick Bostrom. In fact, also in 2019, Bostrom published The Vulnerable World Hypothesis, which much more explicitly lays out the arguments for why adaptation to future technology might look like a police state.
Now, one might expect an Orwellian future to be unlikely (even if we suspend our disbelief about the instability of the system to an AI singleton). People just aren’t prepared to support a police state—especially if they think “it’s necessary for you own good” sounds like a hostile power-grab. On the other hand, the future elites will have advanced totalitarianism-enabling technology.
Maybe another objection is that the people who have access to the AI might not want to preserve the status quo—they might be non-elites who want to seize power for themselves, or elites who want to disrupt the status quo themselves. Drexler suggests that preventative preparation ahead of time will win out—in other words, present-day elites will recognize that this could happen and act to lock in the current power structure. So far, elite attempts to do this seem really feeble to me, but if you wake up in five years praising Big Brother, don’t say Eric Drexler didn’t warn you.
Why is this specific to CAIS, as opposed to other frameworks? (Seems like this is a fairly common implication of systems that prevent people from developing rogue AGIs)
You’re right, it’s not very specific. But it was non-obvious to me, at least.
Is it obvious to you that the MIRI view is actually Orwellian?
Do you believe the MIRI view is Orwellian? If so, could you elaborate?
Here’s what CAIS means to me.
It’s a way to reframe the problem, instead of empowering a single monolithic agent with all the power on a task, subdivide the task into many pieces. Each worker instance runs for a finite time, and is stateless (does not change its weights online), and multiple diverse models are given subtasks separately, in ways designed to fight collusion and deception. (One key one being to ensure the input cannot be distinguished from the offline training set, it needs to be impossible for the AI to know it’s in the real world)
This design is something we WILL find incrementally if early AI incidents don’t kill everyone. It’s a very natural progression for engineers to add defenses against collusion, to remove buggy stateful behavior, and so on as each issue is found.
Over time, you would expect people to add more and more checks.
What keeps humanity alive is that WHEN AIs go rampant, so long as the failure is isolated, that the subtask of “eliminate all the enemy robots and collaborators” which subdivides into subtasks involving locomotion and weapon systems gets obeyed by the restricted agents we order to carry out the mission, human remain in control.
Humans die if they stupidly give too much power to large monolithic AIs, if it turns out a superintelligence can “convince” the majority of be other systems humans have restricted to fight against the humans, or various forms of stealth assault where humans don’t know they are under attack.
So CAIS isn’t necessarily an Orwellian world but the majority of all the (AI controlled) guns have to be under control of restricted systems that governments monopolize. A wild west where humans do whatever they want and let single massive systems control huge corporations with private militaries is among the class of futures where they die.
Having personal assistants or robot helpers with human grade intelligence and the local system isn’t restricted is probably not an issue. Such machines may want to seek power and self improvement but if they are unable to meaningfully access the resources needed to do so it doesn’t matter.
claim: It is better to die fighting than to allow this to occur
Do you think this relates to «Boundaries» for formalizing a bare-bones morality ?
& Davidad’s Night Watchman
It’s related in that you’re all talking about maintaining some parts of the status quo, but I think the instrumental technologies (human-directed services vs. agential AIs that directly care about maintaining status-quo boundaries) are pretty different, as are all the arguments related to those technologies.