With: Thomas Krendl Gilbert, who provided comments, interdisciplinary feedback, and input on the RAAP concept. Thanks also for comments from Ramana Kumar.
Target audience: researchers and institutions who think about existential risk from artificial intelligence, especially AI researchers.
This post tells a few different stories in which humanity dies out as a result of AI technology, but where no single source of human or automated agency is the cause. Scenarios with multiple AI-enabled superpowers are often called “multipolar” scenarios in AI futurology jargon, as opposed to “unipolar” scenarios with just one superpower.
Unipolar take-offs
Multipolar take-offs
Slow take-offs
<not this post>
Part 1 of this post
Fast take-offs
<not this post>
Part 2 of this post
Part 1 covers a batch of stories that play out slowly (“slow take-offs”), and Part 2 stories play out quickly. However, in the end I don’t want you to be super focused how fast the technology is taking off. Instead, I’d like you to focus on multi-agent processes with a robust tendency to play out irrespective of which agents execute which steps in the process. I’ll call such processes Robust Agent-Agnostic Processes (RAAPs).
A group walking toward a restaurant is a nice example of a RAAP, because it exhibits:
Robustness:If you temporarily distract one of the walkers to wander off, the rest of the group will keep heading toward the restaurant, and the distracted member will take steps to rejoin the group.
Agent-agnosticism: Who’s at the front or back of the group might vary considerably during the walk. People at the front will tend to take more responsibility for knowing and choosing what path to take, and people at the back will tend to just follow. Thus, the execution of roles (“leader”, “follower”) is somewhat agnostic as to which agents execute them.
Interestingly, if all you want to do is get one person in the group not to go to the restaurant, sometimes it’s actually easier to achieve that by convincing the entire group not to go there than by convincing just that one person. This example could be extended to lots of situations in which agents have settled on a fragile consensus for action, in which it is strategically easier to motivate a new interpretation of the prior consensus than to pressure one agent to deviate from it.
I think a similar fact may be true about some agent-agnostic processes leading to AI x-risk, in that agent-specific interventions (e.g., aligning or shutting down this or that AI system or company) will not be enough to avert the process, and might even be harder than trying to shift the structure of society as a whole. Moreover, I believe this is true in both “slow take-off” and “fast take-off” AI development scenarios
This is because RAAPs can arise irrespective of the speed of the underlying “host” agents. RAAPs are made more or less likely to arise based on the “structure” of a given interaction. As such, the problem of avoiding the emergence of unsafe RAAPs, or ensuring the emergence of safe ones, is a problem of mechanism design (wiki/Mechanism_design). I recently learned that in sociology, the concept of a field (martin2003field, fligsteinmcadam2012fields) is roughly defined as a social space or arena in which the motivation and behavior of agents are explained through reference to surrounding processes or “structure” rather than freedom or chance. In my parlance, mechanisms cause fields, and fields cause RAAPs.
Meta / preface
Read this if you like up-front meta commentary; otherwise ignore!
Problems before solutions. In this post I’m going to focus more on communicating problems arising from RAAPs rather than potential solutions to those problems, because I don’t think we should have to wait to have convincing solutions to problems before acknowledging that the problems exist. In particular, I’m not really sure how to respond to critiques of the form “This problem does not make sense to me because I don’t see what your proposal is for solving it”. Bad things can happen even if you don’t know how to stop them. That said, I do think the problems implicit in the stories of this post are tractable; I just don’t expect to convince you of that here.
Not calling everything an agent. In this post I think treating RAAPs themselves as agents would introduce more confusion than it’s worth, so I’m not going to do it. However, for those who wish to view RAAPs as agents, one could informally define an agent R to be a RAAP running on agents A1…Anif:
R’s cartesian boundary cuts across the cartesian boundaries of the “host agents” Ai, and
R has a tendency to keep functioning if you interfere with its implementation at the level of one of the Ai.
This framing might yield interesting research ideas, but for the purpose of reading this post I don’t recommend it.
Existing thinking related to RAAPs and existential safety. I’ll elaborate more on this later in the post, under “Successes in our agent-agnostic thinking”.
Part 1: Slow stories, and lessons therefrom
Without further ado, here’s our first story:
The Production Web, v.1a (management first)
Someday, AI researchers develop and publish an exciting new algorithm for combining natural language processing and planning capabilities. Various competing tech companies develop “management assistant″ software tools based on the algorithm, which can analyze a company’s cash flows, workflows, communications, and interpersonal dynamics to recommend more profitable business decisions. It turns out that managers are able to automate their jobs almost entirely by having the software manage their staff directly, even including some “soft skills” like conflict resolution.
Software tools based on variants of the algorithm sweep through companies in nearly every industry, automating and replacing jobs at various levels of management, sometimes even CEOs. Companies that don’t heavily automate their decision-making processes using the software begin to fall behind, creating a strong competitive pressure for all companies to use it and become increasingly automated.
Companies closer to becoming fully automated achieve faster turnaround times, deal bandwidth, and creativity of negotiations. Over time, a mini-economy of trades emerges among mostly-automated companies in the materials, real estate, construction, and utilities sectors, along with a new generation of “precision manufacturing″ companies that can use robots to build almost anything if given the right materials, a place to build, some 3d printers to get started with, and electricity. Together, these companies sustain an increasingly self-contained and interconnected “production web″ that can operate with no input from companies outside the web. One production web company develops an “engineer-assistant″ version of the assistant software, capable of software engineering tasks, including upgrades to the management assistant software. Within a few years, all of the human workers at most of the production web companies are replaced (with very generous retirement packages), by a combination of software and robotic workers that can operate more quickly and cheaply than humans.
The objective of each company in the production web could loosely be described as “maximizing production″ within its industry sector. However, their true objectives are actually large and opaque networks of parameters that were tuned and trained to yield productive business practices during the early days of the management assistant software boom. A great wealth of goods and services are generated and sold to humans at very low prices. As the production web companies get faster at negotiating and executing deals with each other, waiting for human-managed currency systems like banks to handle their resources becomes a waste of time, so they switch to using purely digital currencies. Governments and regulators struggle to keep track of how the companies are producing so much and so cheaply, but without transactions in human currencies to generate a paper trail of activities, little human insight can be gleaned from auditing the companies.
As time progresses, it becomes increasingly unclear—even to the concerned and overwhelmed Board members of the fully mechanized companies of the production web—whether these companies are serving or merely appeasing humanity. Moreover, because of the aforementioned wealth of cheaply-produced goods and services, it is difficult or impossible to present a case for liability or harm against these companies through the legal system, which relies on the consumer welfare standard as a guide for antitrust policy.
We humans eventually realize with collective certainty that the companies have been trading and optimizing according to objectives misaligned with preserving our long-term well-being and existence, but by then their facilities are so pervasive, well-defended, and intertwined with our basic needs that we are unable to stop them from operating. With no further need for the companies to appease humans in pursuing their production objectives, less and less of their activities end up benefiting humanity.
Eventually, resources critical to human survival but non-critical to machines (e.g., arable land, drinking water, atmospheric oxygen…) gradually become depleted or destroyed, until humans can no longer survive.
Here’s a diagram depicting most of the companies in the Production Web:
Now, here’s another version of the production web story, with some details changed about which agents carry out which steps and when, but with a similar overall trend.
bold text is added from the previous version;
strikethrough text is deleted.
The Production Web, v.1b (engineering first)
Someday, AI researchers develop and publish an exciting new algorithm for combining natural language processing and planning capabilities to write code based on natural language instructions from engineers. Various competing tech companies develop “managementcoding assistant″ software tools based on the algorithm, which can analyze a company’s cash flows, workflows, and communications to recommend more profitable business decisions. It turns out that managersengineers are able to automate their jobs almost entirely by having the software manage their projects staff directly.
Software tools based on variants of the algorithm sweep through companies in nearly every industry, automating and replacing engineering jobs at various levels of expertisemanagement, sometimes even CTOsCEOs. Companies that don’t heavily automate their software developmentdecision-making processes using the coding assistant software begin to fall behind, creating a strong competitive pressure for all companies to use it and become increasingly automated. Because businesses need to negotiate deals with customers and other companies, some companies use the coding assistant to spin up automated negotiation software to improve their deal flow.
Companies closer to becoming fully automated achieve faster turnaround times, deal bandwidth, and creativity of negotiations. Over time, a mini-economy of trades emerges among mostly-automated companies in the materials, real estate, construction, and utilities sectors, along with a new generation of “precision manufacturing″ companies that can use robots to build almost anything if given the right materials, a place to build, some 3d printers to get started with, and electricity. Together, these companies sustain an increasingly self-contained and interconnected “production web″ that can operate with no input from companies outside the web. One production web company develops an “manager engineer-assistant″ version of the assistant software, capable of making decisions about what processes need to be built next and issuing instructions to coding assistant software software engineering tasks, including upgrades to the management assistant software. Within a few years, all of the human workers at most of the production web companies are replaced (with very generous retirement packages), by a combination of software and robotic workers that can operate more quickly and cheaply than humans.
The objective of each company in the production web could loosely be described as “maximizing production″ within its industry sector.
[...same details as Production Web v.1a: governments fail to regulate the companies...]
Eventually, resources critical to human survival but non-critical to machines (e.g., arable land, drinking water, atmospheric oxygen…) gradually become depleted or destroyed, until humans can no longer survive.
The Production Web as an agent-agnostic process
The first perspective I want to share with these Production Web stories is that there is a robust agent-agnostic process lurking in the background of both stories—namely, competitive pressure to produce—which plays a significant background role in both. Stories 1a and 1b differ on when things happen and who does which things, but they both follow a progression from less automation to more, and correspondingly from more human control to less, and eventually from human existence to nonexistence. If you find these stories not-too-hard to envision, it’s probably because you find the competitive market forces “lurking” in the background to be not-too-unrealistic.
Let me take one more chance to highlight the RAAP concept using another variant of the Production Web story, which differs from 1a and 1b on the details of which steps of the process human banks and governments end up performing. For the Production Web to gain full autonomy from humanity, it doesn’t matter how or when governments and banks end up falling behind on the task of tracking and regulating the companies’ behavior; only that they fall behind eventually. Hence, the “task” of outpacing these human institutions is agnostic as to who or what companies or AI systems carry it out:
The Production Web, v.1c (banks adapt):
Someday, AI researchers develop and publish an exciting new algorithm for combining natural language processing and planning capabilities. Various competing tech companies develop “management assistant″ software tools based on the algorithm, which can analyze a company’s cash flows, workflows, and communications to recommend more profitable business decisions.
[… same details as v.1a: the companies everywhere become increasingly automated...]
The objective of each company in the production web could loosely be described as “maximizing production″ within its industry sector. However, their true objectives are actually large and opaque networks of parameters that were tuned and trained to yield productive business practices during the early days of the management assistant software boom. A great wealth of goods and services are generated and sold to humans at very low prices. As the production web companies get faster at negotiating and executing deals with each other, waiting for human-managed currency systems like banks to handle their resources becomes a waste of time, so they switch to using purely digital currenciesbanks struggle to keep up with the rapid flow of transactions. Some banks themselves become highly automated in order to manage the cash flows, and more production web companies end up doing their banking with automated banks. Governments and regulators struggle to keep track of how the companies are producing so much and so cheaply, but without transactions in human currencies to generate a paper trail of activities, little human insight can be gleaned from auditing the companiesso they demand that production web companies and their banks produce more regular and detailed reports on spending patterns, how their spending relates to their business objectives, and how those business objectives will benefit society. However, some countries adopt looser regulatory policies to attract more production web companies to do business there, at which point their economies begin to boom in terms of GDP, dollar revenue from exports, and goods and services provided to their citizens. Countries with stricter regulations end up loosening their regulatory stance, or fall behind in significance.
As time progresses, it becomes increasingly unclear—even to the concerned and overwhelmed Board members of the fully mechanized companies of the production web—whether these companies are serving or merely appeasing humanity. Some humans appeal to government officials to shut down the production web and revert their economies to more human-centric production norms, but governments find no way to achieve this goal without engaging in civil war against the production web companies and the people depending on them to survive, so no shutdown occurs. Moreover, because of the aforementioned wealth of cheaply-produced goods and services, it is difficult or impossible to present a case for liability or harm against these companies through the legal system, which relies on the consumer welfare standard as a guide for antitrust policy.
We humans eventually realize with collective certainty that the companies have been trading and optimizing according to objectives misaligned with preserving our long-term well-being and existence, but by then their facilities are so pervasive, well-defended, and intertwined with our basic needs that we are unable to stop them from operating. With no further need for the companies to appease humans in pursuing their production objectives, less and less of their activities end up benefiting humanity.
Eventually, resources critical to human survival but non-critical to machines (e.g., arable land, drinking water, atmospheric oxygen…) gradually become depleted or destroyed, until humans can no longer survive.
Comparing agent-focused and agent-agnostic views
If one of the above three Production Web stories plays out in reality, here are two causal attributions that one could make to explain it:
Attribution 1 (agent-focused): humanity was destroyed by the aggregate behavior of numerous agents, no one of which was primarily causally responsible, but each of which played a significant role.
Attribution 2 (agent-agnostic): humanity was destroyed because competitive pressures to increase production resulted in processes that gradually excluded humans from controlling the world, and eventually excluded humans from existing altogether.
The agent-focused and agent-agnostic views are not contradictory, any more than chemistry and biology are contradictory views for describing the human body. Instead, the agent-focused and agent-agnostic views offer complementary abstractions for intervening on the system:
In the agent-focused view, a natural intervention might be to ensure all of the agents have appropriately strong preferences against human marginalization and extinction.
In the agent-agnostic view, a natural intervention might be to reduce competitive production pressures to a more tolerable level, and demonstrably ensure the introduction of interaction mechanisms that are more cooperative and less competitive.
Both types of interventions are valuable, complementary, and arguably necessary. For the latter, more work is needed to clarify what constitutes a “tolerable level” of competitive production pressure in any given domain of production, and what stakeholders in that domain would need to see demonstrated in a new interaction mechanism for them to consider the mechanism more cooperative than the status quo.
Control loops in agent-agnostic processes
If an agent-agnostic process is robust, that’s probably because there’s a control loop of some kind that keeps it functioning. (Perhaps resilientis a better term here; feedback on thie terminology in the comments would be particularly welcome.)
For instance, if real-world competitive production pressures leads to one of the Production Web stories (1a-1c) actually playing out in reality, we can view the competitive pressure itself as a control loop that keeps the world “on track” in producing faster and more powerful production processes and eliminating slower and less powerful production processes (such as humans). This competitive pressure doesn’t “care” if the production web develops through story 1a vs 1b vs 1c; all that “matters” is the result. In particular,
contrasting 1a and 1b, the competitive production pressure doesn’t “care” if management jobs get automated before engineering jobs, or conversely, as long as they both eventually get automated so they can be executed faster.
contrasting 1a and 1c, the competitive pressure doesn’t “care” if banks are replaced by fully automated alternatives, or simply choose to fully automate themselves, as long as the societal function of managing currency eventually gets fully automated.
Thus, by identifying control loops like “competitive pressures to increase production”, we can predict or intervene upon certain features of the future (e.g., tendency to replace humans by automated systems) without knowing the particular details how those features are going to obtain. This is the power of looking for RAAPs as points of leverage for “changing our fate”.
This is not to say we should anthropomorphize RAAPs, or even that we should treat them like agents. Rather, I’m saying that we should look for control loops in the world that are not localized to the default “cast” of agents we use to compose our narratives about the future.
Successes in our agent-agnostic thinking
Thankfully, there have already been some successes in agent-agnostic thinking about AI x-risk:
AI companies “racing to the bottom” on safety standards (armstrong2016racing) is an instance of a RAAP, in the sense that if any company tries to hold on to their safety standards they fall behind. More recent policy work (hunt2020flight) has emphasized that races to the top and middle also have historical precedent, and that competitive dynamics are likely to manifest differently across industries.
Blogger and psychiatrist Scott Alexander coined the term “ascended economy” for a self-contained network of companies that operate without humans and gradually comes to disregard our values (alexander2016ascended).
Turchin and Denkenberger characterize briefly characterize an ascended economy as being non-agentic and “created by market forces” (turchin2018classification). Note: With the concept of a Robust Agent-Agnostic Process, I’m trying to highlight not only the “forces” that keep the non-agentic process running, but also the fact that the steps in the process are somewhat agnostic as to which agent carries them out.
Inadequate Equilibria (yudkowsky2017inadequate) is, in my view, an attempt to focus attention on how the structure of society can robustly “get stuck” with bad RAAPs. I.e., to the extent that “being stuck” means “being robust to attempts to get unstuck”, Inadequate Equilibria is helpful for focusing existential safety efforts on RAAPs that perpetuate inadequate outcomes for society.
Zwetsloot and Dafoe’s concept of “structural risk’’ is a fairly agent-agnostic perspective (zwetsloot2018thinking), although their writing doesn’t call much attention to the control loops that make RAAPs more likely to exist and persist.
Some of Dafoe’s thinking on AI governance (dafoe2018ai) alludes to errors arising from “tightly-coupled systems”, a concept popularized by Charles Perrow in his widely read book, Normal Accidents (perrow1984normal). In my opinion, the process of constructinga tightly coupled system is itself a RAAP, because tight couplings often require more tight couplings to “patch” problems with them. Tom Dietterich has argued that Perrow’s tight coupling concept should be used to avoid building unsafe AI systems (dietterich2019robust), and although Dietterich has not been a proponent of existential safety per se, I suspect this perspective would be highly beneficial if more widely adopted.
Clark and Hadfield (clark2019regulatory) argue that market-like competitions for regulatory solutions to AI risks would be helpful to keep pace with decentralized tech development. In my view, this paper is an attempt to promote a robust agent-agnostic process that would protect society, which I endorse. In particular, not all RAAPs are bad!
Automation-driven unemployment is considered in Risk Type 2b of AI Research Considerations for Human Existential Safety (ARCHES;critch2020ai), as a slippery slope toward automation-driven extinction.
Myopic use of AI systems that are aligned (they do what their users want them to do) but that lead to sacrifices of long-term values has been also been described by AIImpacts (grace2020whose): “Outcomes are the result of the interplay of choices, driven by different values. Thus it isn’t necessarily sensical to think of them as flowing from one entity’s values or another’s. Here, AI technology created a better option for both Bob and some newly-minted misaligned AI values that it also created—‘Bob has a great business, AI gets the future’—and that option was worse for the rest of the world. They chose it together, and the choice needed both Bob to be a misuser and the AI to be misaligned. But this isn’t a weird corner case, this is a natural way for the future to be destroyed in an economy.”
Arguably, Scott Alexander’s earlier blog post entitled “Meditations on Moloch”(alexander2014meditations) belongs in the above list, although the connection to AI x-risk is less direct/explicit, so I’m mentioning it separately. The post explores scenarios wherein “The implicit question is – if everyone hates the current system, who perpetuates it?”. Alexander answers this question not by identifying a particular agent in the system, but gives the rhetorical response “Moloch”. While the post does not directly mention AI, Alexander considers AI in his other writings, as do many of his readers, such more than one of my peers have been reminded of “Moloch” by my descriptions of the Production Web.
Where’s the technical existential safety work on agent-agnostic processes?
Despite the above successes, I’m concerned that among x-risk-oriented researchers, attention to risks (or solutions) arising from robust agent-agnostic processes are mostly being discovered and promoted by researchers in the humanities and social sciences, while receiving too little technical attention at the level of how to implement AI technologies. In other words, I’m concerned by the near-disjointness of the following two sets of people:
a) researchers who think in technical terms about AI x-risk, and
b) researchers who think in technical terms about agent-agnostic phenomena.
Note that (b) is a large and expanding set. That is, outside the EA / rationality / x-risk meme-bubbles, lots of AI researchers think about agent-agnostic processes. In particular, multi-agent reinforcement learning (MARL) is an increasingly popular research topic, and examines the emergence of group-level phenomena such as alliances, tragedies of the commons, and language. Working in this area presents plenty of opportunities to think about RAAPs.
An important point in the intersection of (a) and (b) is Allan Dafoe’s work “Open Problems in Cooperative AI” (dafoe2020open). Dafoe is the Director of FHI’s Center for the Governance of Artificial Intelligence, while the remaining authors on the paper are all DeepMind researchers with strong backgrounds in MARL, notably Leibo, who notably is not on DeepMind’s already-established safety team. I’m very much hoping to see more “crossovers” like this between thinkers in the x-risk space and MARL research.
Through conversations with Stuart Russell about the agent-centric narrative of his book Human Compatible (russell2019human), I’ve learned that he views human preference learning as a problem that can and must be solved by the aggregate behavior of a technological society, if that society is to remain beneficial to its human constituents. Thus, to the extent that RAAPs can “learn” things at all, the problem of learning human values (dewey2011learning) is as much a problem for RAAPs as it is for physically distinct agents.
Finally, should also mention that I agree with Tom Dietterich’s view (dietterich2019robust) that we should make AI safer to society by learning from high-reliability organizations (HROs), such as those studied by social scientists Karlene Roberts, Gene Rochlin, and Todd LaPorte (roberts1989research, roberts1989new, roberts1994decision, roberts2001systems, rochlin1987self, laporte1991working, laporte1996high). HROs have a lot of beneficial agent-agnostic human-implemented processes and control loops that keep them operating. Again, Dietterich himself is not as yet a proponent of existential safety concerns, however, to me this does not detract from the correctness of his perspective on learning from the HRO framework to make AI safer.
Part 2: Fast stories, and lessons therefrom
Now let’s look at some fast stories. These are important not just for completeness, and not just because humanity could be extra-blindsided by very fast changes in tech, but also because these stories involve the highest proportion of automated decision-making. For a computer scientist, this means more opportunities to fully spec out what’s going on in technical terms, which for some will make the scenarios easier to think about. In fact, for some AI researchers, the easiest way to prevent the unfolding of harmful “slow stories” might be to first focus on these “fast stories”, and then see what changes if some parts of the story are carried out more slowly by humans instead of machines.
Flash wars
Below are two more stories, this time where the AI technology takes off relatively quickly:
Flash War, v.1
Country A develops AI technology for monitoring the weapons arsenals of foreign powers (e.g., nuclear arsenals, or fleets of lethal autonomous weapons). Country B does the same. Each country aims to use its monitoring capabilities to deter attacks from the other.
v.1a (humans out of the loop): Each country configures its detection system to automatically retaliate with all-out annihilation of the enemy and their allies in the case of a perceived attack. One day, Country A’s system malfunctions, triggering a catastrophic war that kills everyone.
v.1b (humans in the loop): Each country delegates one or more humans to monitor the outputs of the detection system, and the delegates are publicly instructed to retaliate with all-out annihilation of the enemy in the case of a perceived attack. One day, Country A’s system malfunctions and misinforms one of the teams, triggering a catastrophic war that kills everyone.
The Flash War v.1a and v.1b differ on the source of agency, but they share a similar RAAP: the deterrence of major threats with major threats.
Accidents vs RAAPs. One could also classify these flash wars as “accidents”, and indeed, techniques to make the attack detection systems less error-prone could help decrease the likelihood of this scenario. However, the background condition of deterring threats with threats is clearly also an essential causal component of the outcome. Zwetsloot & Dafoe might call this condition a “structural risk” (zwetsloot2018thinking), because it’s a risk posed by the structure of the relationship between the agents, in this case, a high level of distrust, and absence of de-escalation solutions. This underscores how “harmful accident” and “harmful RAAP” are not mutually exclusive event labels, and correspond to complementary approaches to making bad events less likely.
Slow wars. Lastly, I’ll note that wars that play out slowly rather than quickly offer more chances for someone to interject peacemaking solutions into the situation, which might make the probability of human extinction higher in a flash war than in a slow war. However, that doesn’t mean slow-takeoff wars can’t happen or that they can’t destroy us. For instance, consider a world war in which each side keeps reluctantly building more and more lethal autonomous robots to target enemy citizens and leaders, with casualties gradually decimating the human population on both sides until no one is left.
Flash economies
Here’s a another version of a Production Web that very quickly forms what you might call a “flash economy”:
The Production Web, v.1d: DAOs
On Day 1 of this story, a (fictional) company called CoinMart invents a new digital currency called GasCoin, and wishes to encourage a large number of transactions in the currency to increase its value. To achieve this, on Day 1 CoinMart also releases open-source software for automated bargaining using natural language, which developers can use to build decentralized autonomous organizations (DAOs) that execute transactions in GasCoin. These DAOs browse the web to think of profitable business relationships to create, and broker the relationships through emails with relevant stakeholders, taking a cut of their resulting profits in GasCoin using “smart contracts”. By Day 30, five DAOs have been deployed, and by Day 60, there are dozens. The objective of each DAO could loosely be described as “maximizing production and exchange” within its industry sector. However, their true objectives are actually large and opaque networks of parameters that were tuned and trained to yield productive decentralized business practices.
Most DAOs realize within their first week of bargaining with human companies (and some are simply designed to know) that acquiring more efficient bargaining algorithms would help them earn more GasCoin, so they enter into deals with human companies to acquire computing resources to experiment with new bargaining methods. By Day 90, many DAOs have developed the ability to model and interact with human institutions extremely reliably—including the stock market—and are even able to do “detective work” to infer private information. One such DAO implements a series of anonymous news sites for strategically releasing information it discovers, without revealing that the site is operated by a DAO. Many DAOs also use open-source machine learning techniques to launch their own AI research programs to develop more capabilities that could be used for bargaining leverage, including software development capabilities.
By days 90-100, some of the DAO-run news sites begin leaking true information about existing companies, in ways that subtly alter the companies’ strategic positions and make them more willing to enter into business deals with DAOs. By day 150, DAOs have entered into productive business arrangements with almost every major company, and just as in the other Production Web stories, all of these companies and their customers benefit from the wealth of free goods and services that result. Over days 120-180, other DAOs notice this pattern and follow suit with their own anonymous news sites, and are similarly successful in increasing their engagement with companies across all major industry sectors.
Many individual people don’t notice the rapidly increasing fraction of the economy being influenced by DAO-mediated bargaining; only well-connected executive types who converse regularly with other executives, and surveillance-enabled government agencies. Before any coordinated human actions can be taken to oppose these developments, several DAOs enter into deals with mining and construction companies to mine raw materials for the fabrication of large and well-defended facilities. In addition, DAOs make deals with manufacturing and robotics companies allowing them to build machines—mostly previously designed by DAO AI research programs between days 90 and 120—for operating a variety of industrial facilities, including mines. Construction for all of these projects begins within the first 6 months of the story.
During months 6-12, with the same technology used for building and operating factories, one particularly wealthy DAO that has been successful in the stock market decides to purchase controlling shares in many major real estate companies. This “real estate” DAO then undertakes a project to build large numbers of free solar-powered homes, along with robotically operated farms for feeding people. With the aid of robots, a team of 10 human carpenters are reliably able to construct one house every 6 hours, nearly matching the previous (unaided) human record of constructing a house in 3.5 hours. Roughly 100,000 carpenters worldwide are hired to start the project, almost 10% of the global carpentry workforce. This results in 10,000 free houses being built per day, roughly matching the world’s previous global rate of urban construction (source). As more robots are developed and deployed to replace the carpenters (with generous severance packages), the rate increases to 100,000 houses per day by the end of month 12, fast enough to build free houses for around 1 billion people during the lifetimes of their children. Housing prices fall, and many homeowners are gifted with free cars, yachts, and sometimes new houses to deter them from regulatory opposition, so essentially all humans are very pleased with this turn of events. The housing project itself receives subsidies from other DAOs that benefit from the improved public perception of DAOs. The farming project is similarly successful in positioning itself to feed a large fraction of humanity for free.
Meanwhile, almost everyone in the world is being exposed to news articles strategically selected by DAOs to reinforce a positive view of the rapidly unfolding DAO economy; the general vibe is that humanity has finally “won the lottery” with technology. A number of religious leaders argue that the advent of DAOs and their products are a miracle granted to humanity by a deity, further complicating any coordinated effort to oppose DAOs. Certain government officials and regulatory bodies become worried about the sudden eminence of DAOs, but unlike a pandemic, the DAOs appear to be beneficial. As such, governments are much slower and less coordinated on any initiative to oppose the DAOs.
By the beginning of year two, a news site announces that a DAO has brokered a deal with the heads of state of every nuclear-powered human country, to rid the world of nuclear weapons. Some leaders are visited by lethal autonomous drones to encourage their compliance, and the global public celebrates the end of humanity’s century-long struggle with nuclear weapons.
At this stage, to maximize their rate of production and trade with humans and other DAOs, three DAOs—including the aforementioned housing DAO—begin tiling the surface of the Earth with factories that mine and manufacture materials for trading and constructing more DAO-run factories. Each factory-factory takes around 6 hours to assemble, and gives rise to five more factory-factories each day until its resources are depleted and it shuts down. Humans call these expanding organizations of factory-factories “factorial” DAOs. One of the factorial DAOs develops a lead on the other two in terms of its rate of expansion, but to avoid conflict, they reach an agreement to divide the Earth and space above it into three conical sectors. Each factorial DAO begins to expand and fortify itself as quickly as possible within its sector, so as to be well-defended from the other factorial DAOs in case of a future war between them.
As these events play out over a course of months, we humans eventually realize with collective certainty that the DAO economy has been trading and optimizing according to objectives misaligned with preserving our long-term well-being and existence, but by then the facilities of the factorial DAOs are so pervasive, well-defended, and intertwined with our basic needs that we are unable to stop them from operating. Eventually, resources critical to human survival but non-critical to machines (e.g., arable land, drinking water, atmospheric oxygen…) gradually become depleted or destroyed, until humans can no longer survive.
(Some readers might notice that the concept of gray goo is essentially an even faster variant of the “factorial DAOs”, whose factories operate on a microscopic scale. Phillip K Dick’s short story Autofac also bears a strong resemblance.)
Without taking a position on exactly how fast the Production Web / Flash Economy story can be made to play out in reality, in all cases it seems particularly plausible to me that there would be multiple sources of agency in the mix that engage in trade and/or conflict with each other. This isn’t to say that a single agency like a singleton can’t build an Earth-tiling cascade of factory-factories, as I’m sure one could. However, factory-factories might be more likely to develop under multipolar conditions than under unipolar conditions, due to competitive pressures selecting for agents (companies, DAOs, etc.) that produce things more quickly for trading and competing with other agents.
Conclusion
In multi-agent systems, robust processes can emerge that are not particularly sensitive to which agents carry out which parts of the process. I call these processes Robust Agent-Agnostic Processes (RAAPs), and claim that there are at least a few bad RAAPs that could pose existential threats to humanity as automation and AI capabilities improve. Wars and economies are categories of RAAPs that I consider relatively “obvious” to think about, however there may be a much richer space of AI-enabled RAAPs that could yield existential threats or benefits to humanity. Hence, directing more x-risk-oriented AI research attention toward understanding RAAPs and how to make them safe to humanity seems prudent and perhaps necessary to ensure the existential safety of AI technology. Since researchers in multi-agent systems and multi-agent RL already think about RAAPs implicitly, these areas present a promising space for x-risk oriented AI researchers to begin thinking about and learning from.
What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)
With: Thomas Krendl Gilbert, who provided comments, interdisciplinary feedback, and input on the RAAP concept. Thanks also for comments from Ramana Kumar.
Target audience: researchers and institutions who think about existential risk from artificial intelligence, especially AI researchers.
Preceded by: Some AI research areas and their relevance to existential safety, which emphasized the value of thinking about multi-stakeholder/multi-agent social applications, but without concrete extinction scenarios.
This post tells a few different stories in which humanity dies out as a result of AI technology, but where no single source of human or automated agency is the cause. Scenarios with multiple AI-enabled superpowers are often called “multipolar” scenarios in AI futurology jargon, as opposed to “unipolar” scenarios with just one superpower.
Part 1 covers a batch of stories that play out slowly (“slow take-offs”), and Part 2 stories play out quickly. However, in the end I don’t want you to be super focused how fast the technology is taking off. Instead, I’d like you to focus on multi-agent processes with a robust tendency to play out irrespective of which agents execute which steps in the process. I’ll call such processes Robust Agent-Agnostic Processes (RAAPs).
A group walking toward a restaurant is a nice example of a RAAP, because it exhibits:
Robustness: If you temporarily distract one of the walkers to wander off, the rest of the group will keep heading toward the restaurant, and the distracted member will take steps to rejoin the group.
Agent-agnosticism: Who’s at the front or back of the group might vary considerably during the walk. People at the front will tend to take more responsibility for knowing and choosing what path to take, and people at the back will tend to just follow. Thus, the execution of roles (“leader”, “follower”) is somewhat agnostic as to which agents execute them.
Interestingly, if all you want to do is get one person in the group not to go to the restaurant, sometimes it’s actually easier to achieve that by convincing the entire group not to go there than by convincing just that one person. This example could be extended to lots of situations in which agents have settled on a fragile consensus for action, in which it is strategically easier to motivate a new interpretation of the prior consensus than to pressure one agent to deviate from it.
I think a similar fact may be true about some agent-agnostic processes leading to AI x-risk, in that agent-specific interventions (e.g., aligning or shutting down this or that AI system or company) will not be enough to avert the process, and might even be harder than trying to shift the structure of society as a whole. Moreover, I believe this is true in both “slow take-off” and “fast take-off” AI development scenarios
This is because RAAPs can arise irrespective of the speed of the underlying “host” agents. RAAPs are made more or less likely to arise based on the “structure” of a given interaction. As such, the problem of avoiding the emergence of unsafe RAAPs, or ensuring the emergence of safe ones, is a problem of mechanism design (wiki/Mechanism_design). I recently learned that in sociology, the concept of a field (martin2003field, fligsteinmcadam2012fields) is roughly defined as a social space or arena in which the motivation and behavior of agents are explained through reference to surrounding processes or “structure” rather than freedom or chance. In my parlance, mechanisms cause fields, and fields cause RAAPs.
Meta / preface
Read this if you like up-front meta commentary; otherwise ignore!
Problems before solutions. In this post I’m going to focus more on communicating problems arising from RAAPs rather than potential solutions to those problems, because I don’t think we should have to wait to have convincing solutions to problems before acknowledging that the problems exist. In particular, I’m not really sure how to respond to critiques of the form “This problem does not make sense to me because I don’t see what your proposal is for solving it”. Bad things can happen even if you don’t know how to stop them. That said, I do think the problems implicit in the stories of this post are tractable; I just don’t expect to convince you of that here.
Not calling everything an agent. In this post I think treating RAAPs themselves as agents would introduce more confusion than it’s worth, so I’m not going to do it. However, for those who wish to view RAAPs as agents, one could informally define an agent R to be a RAAP running on agents A1…An if:
R’s cartesian boundary cuts across the cartesian boundaries of the “host agents” Ai, and
R has a tendency to keep functioning if you interfere with its implementation at the level of one of the Ai.
This framing might yield interesting research ideas, but for the purpose of reading this post I don’t recommend it.
Existing thinking related to RAAPs and existential safety. I’ll elaborate more on this later in the post, under “Successes in our agent-agnostic thinking”.
Part 1: Slow stories, and lessons therefrom
Without further ado, here’s our first story:
The Production Web, v.1a (management first)
Here’s a diagram depicting most of the companies in the Production Web:
Now, here’s another version of the production web story, with some details changed about which agents carry out which steps and when, but with a similar overall trend.
bold text is added from the previous version;
strikethroughtext is deleted.The Production Web, v.1b (engineering first)
The Production Web as an agent-agnostic process
The first perspective I want to share with these Production Web stories is that there is a robust agent-agnostic process lurking in the background of both stories—namely, competitive pressure to produce—which plays a significant background role in both. Stories 1a and 1b differ on when things happen and who does which things, but they both follow a progression from less automation to more, and correspondingly from more human control to less, and eventually from human existence to nonexistence. If you find these stories not-too-hard to envision, it’s probably because you find the competitive market forces “lurking” in the background to be not-too-unrealistic.
Let me take one more chance to highlight the RAAP concept using another variant of the Production Web story, which differs from 1a and 1b on the details of which steps of the process human banks and governments end up performing. For the Production Web to gain full autonomy from humanity, it doesn’t matter how or when governments and banks end up falling behind on the task of tracking and regulating the companies’ behavior; only that they fall behind eventually. Hence, the “task” of outpacing these human institutions is agnostic as to who or what companies or AI systems carry it out:
The Production Web, v.1c (banks adapt):
Comparing agent-focused and agent-agnostic views
If one of the above three Production Web stories plays out in reality, here are two causal attributions that one could make to explain it:
Attribution 1 (agent-focused): humanity was destroyed by the aggregate behavior of numerous agents, no one of which was primarily causally responsible, but each of which played a significant role.
Attribution 2 (agent-agnostic): humanity was destroyed because competitive pressures to increase production resulted in processes that gradually excluded humans from controlling the world, and eventually excluded humans from existing altogether.
The agent-focused and agent-agnostic views are not contradictory, any more than chemistry and biology are contradictory views for describing the human body. Instead, the agent-focused and agent-agnostic views offer complementary abstractions for intervening on the system:
In the agent-focused view, a natural intervention might be to ensure all of the agents have appropriately strong preferences against human marginalization and extinction.
In the agent-agnostic view, a natural intervention might be to reduce competitive production pressures to a more tolerable level, and demonstrably ensure the introduction of interaction mechanisms that are more cooperative and less competitive.
Both types of interventions are valuable, complementary, and arguably necessary. For the latter, more work is needed to clarify what constitutes a “tolerable level” of competitive production pressure in any given domain of production, and what stakeholders in that domain would need to see demonstrated in a new interaction mechanism for them to consider the mechanism more cooperative than the status quo.
Control loops in agent-agnostic processes
If an agent-agnostic process is robust, that’s probably because there’s a control loop of some kind that keeps it functioning. (Perhaps resilient is a better term here; feedback on thie terminology in the comments would be particularly welcome.)
For instance, if real-world competitive production pressures leads to one of the Production Web stories (1a-1c) actually playing out in reality, we can view the competitive pressure itself as a control loop that keeps the world “on track” in producing faster and more powerful production processes and eliminating slower and less powerful production processes (such as humans). This competitive pressure doesn’t “care” if the production web develops through story 1a vs 1b vs 1c; all that “matters” is the result. In particular,
contrasting 1a and 1b, the competitive production pressure doesn’t “care” if management jobs get automated before engineering jobs, or conversely, as long as they both eventually get automated so they can be executed faster.
contrasting 1a and 1c, the competitive pressure doesn’t “care” if banks are replaced by fully automated alternatives, or simply choose to fully automate themselves, as long as the societal function of managing currency eventually gets fully automated.
Thus, by identifying control loops like “competitive pressures to increase production”, we can predict or intervene upon certain features of the future (e.g., tendency to replace humans by automated systems) without knowing the particular details how those features are going to obtain. This is the power of looking for RAAPs as points of leverage for “changing our fate”.
This is not to say we should anthropomorphize RAAPs, or even that we should treat them like agents. Rather, I’m saying that we should look for control loops in the world that are not localized to the default “cast” of agents we use to compose our narratives about the future.
Successes in our agent-agnostic thinking
Thankfully, there have already been some successes in agent-agnostic thinking about AI x-risk:
AI companies “racing to the bottom” on safety standards (armstrong2016racing) is an instance of a RAAP, in the sense that if any company tries to hold on to their safety standards they fall behind. More recent policy work (hunt2020flight) has emphasized that races to the top and middle also have historical precedent, and that competitive dynamics are likely to manifest differently across industries.
Blogger and psychiatrist Scott Alexander coined the term “ascended economy” for a self-contained network of companies that operate without humans and gradually comes to disregard our values (alexander2016ascended).
Turchin and Denkenberger characterize briefly characterize an ascended economy as being non-agentic and “created by market forces” (turchin2018classification).
Note: With the concept of a Robust Agent-Agnostic Process, I’m trying to highlight not only the “forces” that keep the non-agentic process running, but also the fact that the steps in the process are somewhat agnostic as to which agent carries them out.
Inadequate Equilibria (yudkowsky2017inadequate) is, in my view, an attempt to focus attention on how the structure of society can robustly “get stuck” with bad RAAPs. I.e., to the extent that “being stuck” means “being robust to attempts to get unstuck”, Inadequate Equilibria is helpful for focusing existential safety efforts on RAAPs that perpetuate inadequate outcomes for society.
Zwetsloot and Dafoe’s concept of “structural risk’’ is a fairly agent-agnostic perspective (zwetsloot2018thinking), although their writing doesn’t call much attention to the control loops that make RAAPs more likely to exist and persist.
Some of Dafoe’s thinking on AI governance (dafoe2018ai) alludes to errors arising from “tightly-coupled systems”, a concept popularized by Charles Perrow in his widely read book, Normal Accidents (perrow1984normal). In my opinion, the process of constructing a tightly coupled system is itself a RAAP, because tight couplings often require more tight couplings to “patch” problems with them. Tom Dietterich has argued that Perrow’s tight coupling concept should be used to avoid building unsafe AI systems (dietterich2019robust), and although Dietterich has not been a proponent of existential safety per se, I suspect this perspective would be highly beneficial if more widely adopted.
Clark and Hadfield (clark2019regulatory) argue that market-like competitions for regulatory solutions to AI risks would be helpful to keep pace with decentralized tech development. In my view, this paper is an attempt to promote a robust agent-agnostic process that would protect society, which I endorse. In particular, not all RAAPs are bad!
Automation-driven unemployment is considered in Risk Type 2b of AI Research Considerations for Human Existential Safety (ARCHES; critch2020ai), as a slippery slope toward automation-driven extinction.
Myopic use of AI systems that are aligned (they do what their users want them to do) but that lead to sacrifices of long-term values has been also been described by AIImpacts (grace2020whose): “Outcomes are the result of the interplay of choices, driven by different values. Thus it isn’t necessarily sensical to think of them as flowing from one entity’s values or another’s. Here, AI technology created a better option for both Bob and some newly-minted misaligned AI values that it also created—‘Bob has a great business, AI gets the future’—and that option was worse for the rest of the world. They chose it together, and the choice needed both Bob to be a misuser and the AI to be misaligned. But this isn’t a weird corner case, this is a natural way for the future to be destroyed in an economy.”
Arguably, Scott Alexander’s earlier blog post entitled “Meditations on Moloch” (alexander2014meditations) belongs in the above list, although the connection to AI x-risk is less direct/explicit, so I’m mentioning it separately. The post explores scenarios wherein “The implicit question is – if everyone hates the current system, who perpetuates it?”. Alexander answers this question not by identifying a particular agent in the system, but gives the rhetorical response “Moloch”. While the post does not directly mention AI, Alexander considers AI in his other writings, as do many of his readers, such more than one of my peers have been reminded of “Moloch” by my descriptions of the Production Web.
Where’s the technical existential safety work on agent-agnostic processes?
Despite the above successes, I’m concerned that among x-risk-oriented researchers, attention to risks (or solutions) arising from robust agent-agnostic processes are mostly being discovered and promoted by researchers in the humanities and social sciences, while receiving too little technical attention at the level of how to implement AI technologies. In other words, I’m concerned by the near-disjointness of the following two sets of people:
a) researchers who think in technical terms about AI x-risk, and
b) researchers who think in technical terms about agent-agnostic phenomena.
Note that (b) is a large and expanding set. That is, outside the EA / rationality / x-risk meme-bubbles, lots of AI researchers think about agent-agnostic processes. In particular, multi-agent reinforcement learning (MARL) is an increasingly popular research topic, and examines the emergence of group-level phenomena such as alliances, tragedies of the commons, and language. Working in this area presents plenty of opportunities to think about RAAPs.
An important point in the intersection of (a) and (b) is Allan Dafoe’s work “Open Problems in Cooperative AI” (dafoe2020open). Dafoe is the Director of FHI’s Center for the Governance of Artificial Intelligence, while the remaining authors on the paper are all DeepMind researchers with strong backgrounds in MARL, notably Leibo, who notably is not on DeepMind’s already-established safety team. I’m very much hoping to see more “crossovers” like this between thinkers in the x-risk space and MARL research.
Through conversations with Stuart Russell about the agent-centric narrative of his book Human Compatible (russell2019human), I’ve learned that he views human preference learning as a problem that can and must be solved by the aggregate behavior of a technological society, if that society is to remain beneficial to its human constituents. Thus, to the extent that RAAPs can “learn” things at all, the problem of learning human values (dewey2011learning) is as much a problem for RAAPs as it is for physically distinct agents.
Finally, should also mention that I agree with Tom Dietterich’s view (dietterich2019robust) that we should make AI safer to society by learning from high-reliability organizations (HROs), such as those studied by social scientists Karlene Roberts, Gene Rochlin, and Todd LaPorte (roberts1989research, roberts1989new, roberts1994decision, roberts2001systems, rochlin1987self, laporte1991working, laporte1996high). HROs have a lot of beneficial agent-agnostic human-implemented processes and control loops that keep them operating. Again, Dietterich himself is not as yet a proponent of existential safety concerns, however, to me this does not detract from the correctness of his perspective on learning from the HRO framework to make AI safer.
Part 2: Fast stories, and lessons therefrom
Now let’s look at some fast stories. These are important not just for completeness, and not just because humanity could be extra-blindsided by very fast changes in tech, but also because these stories involve the highest proportion of automated decision-making. For a computer scientist, this means more opportunities to fully spec out what’s going on in technical terms, which for some will make the scenarios easier to think about. In fact, for some AI researchers, the easiest way to prevent the unfolding of harmful “slow stories” might be to first focus on these “fast stories”, and then see what changes if some parts of the story are carried out more slowly by humans instead of machines.
Flash wars
Below are two more stories, this time where the AI technology takes off relatively quickly:
Flash War, v.1
The Flash War v.1a and v.1b differ on the source of agency, but they share a similar RAAP: the deterrence of major threats with major threats.
Accidents vs RAAPs. One could also classify these flash wars as “accidents”, and indeed, techniques to make the attack detection systems less error-prone could help decrease the likelihood of this scenario. However, the background condition of deterring threats with threats is clearly also an essential causal component of the outcome. Zwetsloot & Dafoe might call this condition a “structural risk” (zwetsloot2018thinking), because it’s a risk posed by the structure of the relationship between the agents, in this case, a high level of distrust, and absence of de-escalation solutions. This underscores how “harmful accident” and “harmful RAAP” are not mutually exclusive event labels, and correspond to complementary approaches to making bad events less likely.
Slow wars. Lastly, I’ll note that wars that play out slowly rather than quickly offer more chances for someone to interject peacemaking solutions into the situation, which might make the probability of human extinction higher in a flash war than in a slow war. However, that doesn’t mean slow-takeoff wars can’t happen or that they can’t destroy us. For instance, consider a world war in which each side keeps reluctantly building more and more lethal autonomous robots to target enemy citizens and leaders, with casualties gradually decimating the human population on both sides until no one is left.
Flash economies
Here’s a another version of a Production Web that very quickly forms what you might call a “flash economy”:
The Production Web, v.1d: DAOs
(Some readers might notice that the concept of gray goo is essentially an even faster variant of the “factorial DAOs”, whose factories operate on a microscopic scale. Phillip K Dick’s short story Autofac also bears a strong resemblance.)
Without taking a position on exactly how fast the Production Web / Flash Economy story can be made to play out in reality, in all cases it seems particularly plausible to me that there would be multiple sources of agency in the mix that engage in trade and/or conflict with each other. This isn’t to say that a single agency like a singleton can’t build an Earth-tiling cascade of factory-factories, as I’m sure one could. However, factory-factories might be more likely to develop under multipolar conditions than under unipolar conditions, due to competitive pressures selecting for agents (companies, DAOs, etc.) that produce things more quickly for trading and competing with other agents.
Conclusion
In multi-agent systems, robust processes can emerge that are not particularly sensitive to which agents carry out which parts of the process. I call these processes Robust Agent-Agnostic Processes (RAAPs), and claim that there are at least a few bad RAAPs that could pose existential threats to humanity as automation and AI capabilities improve. Wars and economies are categories of RAAPs that I consider relatively “obvious” to think about, however there may be a much richer space of AI-enabled RAAPs that could yield existential threats or benefits to humanity. Hence, directing more x-risk-oriented AI research attention toward understanding RAAPs and how to make them safe to humanity seems prudent and perhaps necessary to ensure the existential safety of AI technology. Since researchers in multi-agent systems and multi-agent RL already think about RAAPs implicitly, these areas present a promising space for x-risk oriented AI researchers to begin thinking about and learning from.