Eric Drexler has published a book-length paper on AI
risk, describing an approach that
he calls Comprehensive AI Services (CAIS).
His primary goal seems to be reframing AI risk discussions to use a
rather different paradigm than the one that Nick Bostrom and Eliezer
Yudkowsky have been promoting. (There isn’t yet any paradigm that’s
widely accepted, so this isn’t a
Kuhnian paradigm shift;
it’s better characterized as an amorphous field that is struggling to
establish its first paradigm). Dueling paradigms seems to be the best
that the AI safety field can manage to achieve for now.
I’ll start by mentioning some important claims that Drexler doesn’t
dispute:
it’s hard to reliably align an AI’s values with human values;
recursive self-improvement, as imagined by Bostrom / Yudkowsky,
would pose significant dangers.
Drexler likely disagrees about some of the claims made by Bostrom /
Yudkowsky on those points, but he shares enough of their concerns about
them that those disagreements don’t explain why Drexler approaches AI
safety differently. (Drexler is more cautious than most writers about
making any predictions concerning these three claims).
CAIS isn’t a full solution to AI risks. Instead, it’s better thought of
as an attempt to reduce the risk of world conquest by the first AGI that
reaches some threshold, preserve existing
corrigibility
somewhat past human-level AI, and postpone need for a permanent solution
until we have more intelligence.
Stop Anthropomorphising Intelligence!
What I see as the most important distinction between the CAIS paradigm
and the Bostrom / Yudkowsky paradigm is Drexler’s objection to having
advanced AI be a unified, general-purpose agent.
Intelligence doesn’t require a broad mind-like utility function.
Mindspace is a small subset of the space of intelligence.
Instead, Drexler suggests composing broad AI systems out of many,
diverse, narrower-purpose components. Normal software engineering
produces components with goals that are limited to a specific output.
Drexler claims there’s no need to add world-oriented goals that would
cause a system to care about large parts of spacetime.
Systems built out of components with narrow goals don’t need to develop
much broader goals. Existing trends in AI research suggest that
better-than-human intelligence can be achieved via tools that have
narrow goals.
The AI-services model invites a functional analysis of service
development and delivery, and that analysis suggests that practical
tasks in the CAIS model are readily or naturally bounded in scope and
duration. For example, the task of providing a service is distinct
from the task of developing a system to provide that service, and
tasks of both kinds must be completed without undue cost or delay.
Drexler’s main example of narrow goals is Google’s machine translation,
which has no goals beyond translating the next unit of text. That
doesn’t imply any obvious constraint on how sophisticated its
world-model can be. It would be quite natural for AI progress continue
with components whose “utility function” remains bounded like this.
It looks like this difference between narrow and broad goals can be
turned into a fairly rigorous distinction, but I’m dissatisfied with
available descriptions of the distinction. (I’d also like better names
for them.)
There are lots of clear-cut cases: narrow-task software that just waits
for commands, and on getting a command, it produces a result, then
returns to its prior state; versus a general-purpose agent which is
designed to maximize the price of a company’s stock.
But we need some narrow-task software to remember some information, and
once we allow memory, it gets complicated to analyze whether the
software’s goal is “narrow”.
Drexler seems less optimistic than I am about clarifying this
distinction:
There is no bright line between safe CAI services and unsafe AGI
agents, and AGI is perhaps best regarded as a potential branch from an
R&D-automation/CAIS path.
Because there is no bright line between agents and non-agents, or
between rational utility maximization and reactive behaviors shaped by
blind evolution, avoiding risky behaviors calls for at least two
complementary perspectives: both (1) design-oriented studies that can
guide implementation of systems that will provide requisite degrees of
e.g., stability, reliability, and transparency, and (2) agent-oriented
studies support design by exploring the characteristics of systems
that could display emergent, unintended, and potentially risky
agent-like behaviors.
It may be true that a bright line can’t be explained clearly to laymen,
but I have a strong intuition that machine learning (ML) developers will
be able to explain it to each other well enough to agree on how to
classify the cases that matter.
6.7 Systems composed of rational agents need not maximize a utility
function There is no canonical way to aggregate utilities over
agents, and game theory shows that interacting sets of rational agents
need not achieve even Pareto optimality. Agents can compete to perform
a task, or can perform adversarial tasks such as proposing and
criticizing actions; from an external client’s perspective, these
uncooperative interactions are features, not bugs (consider the
growing utility of generative adversarial networks ). Further,
adaptive collusion can be cleanly avoided: Fixed functions, for
example, cannot negotiate or adapt their behavior to align with
another agent’s purpose. … There is, of course, an even more
fundamental objection to drawing a boundary around a set of agents and
treating them as a single entity: In interacting with a set of agents,
one can choose to communicate with one or another (e.g. with an agent
or its competitor); if we assume that the agents are in effect a
single entity, we are assuming a constraint on communication that does
not exist in the multi-agent model. The models are fundamentally,
structurally inequivalent.
A Nanotech Analogy
Drexler originally described nanotechnology in terms of self-replicating
machines.
Later, concerns about grey goo
caused him to shift his recommendations toward a safer strategy, where
no single machine would be able to replicate itself, but where the
benefits of nanotechnology could be used recursively to improve
nanofactories.
Similarly, some of the more science-fiction style analyses suggest that
an AI with recursive self-improvement could quickly conquer the world.
Drexler’s CAIS proposal removes the “self-” from recursive
self-improvement, in much the same way that nanofactories removed the
“self-” from nanobot self-replication, replacing it with a more
decentralized process that involves preserving more features of existing
factories / AI implementations. The AI equivalent of nanofactories
consists of a set of AI services, each with a narrow goal, which
coordinate in ways that don’t qualify as a unified agent.
It sort of looks like Drexler’s nanotech background has had an important
influence on his views. Eliezer’s somewhat conflicting view seems to
follow a more science-fiction-like pattern of expecting one man to save
(or destroy?) the world. And I could generate similar stories for
mainstream AI researchers.
That doesn’t suggest much about who’s right, but it does suggest that
people are being influenced by considerations that are only marginally
relevant.
How Powerful is CAIS
Will CAIS be slower to develop than recursive self-improvement? Maybe.
It depends somewhat on how fast recursive self-improvement is.
I’m uncertain whether to believe that human oversight is compatible with
rapid development. Some of that uncertainty comes from confusion about
what to compare it to (an agent AGI that needs no human feedback? or one
that often asks humans for approval?).
Some people expect unified agents to be more
powerful
than CAIS. How plausible are their concerns?
Some of it is disagreement over the extent to which human-level AI will
be built with currently understood techniques. (See Victoria
Krakovna’s
chart of what various people believe about this).
Could some of it be due to analogies to people? We have experience with
some very agenty businessmen (e.g. Elon Musk or Bill Gates), and some
bureaucracies made up of not-so-agenty employees (the post office, or
Comcast). I’m tempted to use the intuitions I get from those examples to
conclude that an unified agent AI will be more visionary and eager to
improve. But I worry that doing so anthropomorphises intelligence in a
way that misleads, since I can’t say anything more rigorous than “these
patterns look relevant”.
But if that analogy doesn’t help, then the novelty of the situation
hints we should distrust Drexler’s extrapolation from standard software
practices (without placing much confidence in any alternative).
Cure Cancer Example
Drexler wants some limits on what gets automated. E.g. he wants to avoid
a situation where an AI is told to cure cancer, and does so without
further human interaction. That would risk generating a solution for
which the system misjudges human approval (e.g. mind uploading or
cryonic suspension).
Instead, he wants humans to decompose that into narrower goals (with
substantial AI assistance), such that humans could verify that the goals
are compatible with human welfare (or reject those that are too hard too
evaluate).
This seems likely to delay cancer cures compared to what an agent AGI
would do, maybe by hours, maybe by months, as the humans check the
subtasks. I expect most people would accept such a delay as a reasonable
price for reducing AI risks. I haven’t thought of a realistic example
where I expect the delay would generate a strong incentive for using an
agent AGI, but the cancer example is close enough to be unsettling.
This analysis is reassuring compared to
Superintelligence,
but not as reassuring as I’d like.
As I was writing the last few paragraphs, and thinking about Wei Dai’s
objections,
I found it hard to clearly model how CAIS would handle the cancer
example.
Some of Wei Dai’s objections result from a disagreement about whether
agent AGI has benefits. But his objections suggest other questions, for
which I needed to think carefully in order to guess how Drexler would
answer them: How much does CAIS depend on human judgment about what
tasks to give to a service? Probably quite heavily, in some cases. How
much does CAIS depend on the system having good estimates of human
approval? Probably not too much, as long as experts are aware of how
good those estimates are, and are willing and able to restrict access to
some relatively risky high-level services.
I expect ML researchers can identify a safe way to use CAIS, but it
doesn’t look very close to an idiot-proof framework, at least not
without significant trial and error. I presume there will in the long
run be a need for an idiot-proof interface to most such services, but I
expect those to be developed later.
What Incentives will influence AI Developers?
With grey goo, it was pretty clear that most nanotech developers would
clearly prefer the nanofactory approach, due to it being safer, and
having few downsides.
With CAIS, the incentives are less clear, because it’s harder to tell
whether there will be benefits to agent AGI’s.
Much depends on the controversial assumption that relatively responsible
organizations will develop CAIS well before other entities are able to
develop any form of equally powerful AI. I consider that plausible, but
it seems to be one of the weakest parts of Drexler’s analysis.
If I knew that AI required expensive hardware, I might be confident that
the first human-level AI’s would be developed at large, relatively
risk-averse institutions.
But Drexler has a novel(?) approach (section 40) which suggests that
existing supercomputers have about human-level raw computing power. That
provides a reason for worrying that a wider variety of entities could
develop powerful AI.
Drexler seems to extrapolate current trends, implying that the first
entity to generate human-level AI will look like Google or OpenAI.
Developers there seem likely to be sufficiently satisfied with the kind
of intelligence explosion that CAIS seems likely to produce that it will
only take moderate concern about risks to deter them from pursuing
something more dangerous.
Whereas a poorly funded startup, or the stereotypical lone hacker in a
basement, might be more tempted to gamble on an agent AGI. I have some
hope that human-level AI will require a wide variety of service-like
components, maybe too much for a small organization to handle. But I
don’t like relying on that.
Presumably the publicly available AI services won’t be sufficiently
general and powerful to enable random people to assemble them into an
agent AGI? Combining a robocar + Google translate + an aircraft designer
a theorem prover doesn’t sound dangerous. Section 27.7 predicts that
“senior human decision makers” would have access to a service with some
strategic planning ability (which would have enough power to generate
plans with dangerously broad goals), and they would likely restrict
access to those high-level services. See also section 39.10 for why any
one service doesn’t need to have a very broad purpose.
I’m unsure where Siri and Alexa fit in this framework. Their designers
have some incentive to incorporate goals that extend well into the
future, in order to better adapt to individual customers, by improving
their models of each customers desires. I can imagine that being fully
compatible with a CAIS approach, but I can also imagine them being given
utility functions that would cause them to act quite agenty.
How Valuable is Modularity?
CAIS may be easier to develop, since modularity normally makes software
development easier. On the other hand, modularity seems less important
for ML. On the gripping hand, AI developers will likely be combining ML
with other techniques, and modularity seems likely to be valuable for
those systems, even if the ML parts are not modular. Section 37 lists
examples of systems composed of both ML and traditional software.
And as noted in a recent paper from Google, “Only a small fraction of
real-world ML systems is composed of the ML code [...] The required
surrounding infrastructure is vast and complex.” [Sculley et al.
2015]
Neural networks and symbolic/algorithmic AI technologies are
complements, not alternatives; they are being integrated in multiple
ways at levels that range from components and algorithms to system
architectures.
How much less important is modularity for ML? A typical ML system seems
to do plenty of re-learning from scratch, when we could imagine it
delegating tasks to other components. On the other hand, ML developers
seem to be fairly strongly sticking to the pattern of assigning only
narrow goals to any instance of an ML service, typically using
high-level human judgment to integrate that with other parts.
I expect robocars to provide a good test of how much ML is pushing
software development away from modularity. I’d expect if CAIS is
generally correct, a robocar would have more than 10 independently
trained ML modules integrated into the main software that does the
driving, whereas I’d expect less than 10 if Drexler were wrong about
modularity. My cursory search did not find any clear answer—can anyone
resolve this?
I suspect that most ML literature tends to emphasize monolithic software
because that’s easier to understand, and because those papers focus on
specific new ML features, to which modularity is not very relevant.
Maybe there’s a useful analogy to markets—maybe people underestimate
CAIS because very decentralized systems are harder for people to model.
People often imagine that decentralized markets are less efficient that
centralized command and control, and only seem to tolerate markets after
seeing lots of evidence (e.g. the collapse of communism). On the other
hand, Eliezer and Bostrom don’t seem especially prone to underestimate
markets, so I have low confidence that this guess explains much.
Alas, skepticism of decentralized systems might mean that we’re doomed
to learn the hard way that the same principles apply to AI development
(or fail to learn, because we don’t survive the first mistake).
Transparency?
MIRI has been worrying about the opaqueness of neural nets and similar
approaches to AI, because it’s hard to evaluate the safety of a large,
opaque system. I suspect that complex world-models are inherently hard
to analyze. So I’d be rather pessimistic if I thought we needed the kind
of transparency that MIRI hopes for.
Drexler points out that opaqueness causes fewer problems under the CAIS
paradigm. Individual components may often be pretty opaque, but
interactions between components seem more likely to follow a transparent
protocol (assuming designers value that). And as long as the opaque
components have sufficiently limited goals, the risks that might hide
under that opaqueness are constrained.
Transparent protocols enable faster development by humans, but I’m
concerned that it will be even faster to have AI’s generating systems
with less transparent protocols.
Implications
The differences between CAIS and agent AGI ought to define a threshold,
which could function as a fire
alarm for AI experts.
If AI developers need to switch to broad utility functions in order to
compete, that will provide a clear sign that AI risks are high, and that
something’s wrong with the CAIS paradigm.
CAIS indicates that it’s important to have a consortium of AI companies
to promote safety guidelines, and to propagate a consensus view on how
to stay on the safe side of the narrow versus broad task threshold.
CAIS helps reduce the pressure to classify typical AI research as
dangerous, and therefore reduces AI researcher’s motivation to resist AI
safety research.
Some implications for AI safety researchers in general: don’t imply that
anyone knows whether recursive self-improvement will beat other forms of
recursive improvement. We don’t want to tempt AI researchers to try
recursive self-improvement (by telling people it’s much more powerful).
And we don’t want to err much in the other direction, because we don’t
want people to be complacent about the risks of recursive
self-improvement.
Conclusion
CAIS seems somewhat more grounded in existing software practices than,
say, the paradigm used in Superintelligence, and provides more reasons
for hope. Yet it provides little reason for complacency:
The R&D-automation/AI-services model suggests that conventional AI
risks (e.g., failures, abuse, and economic disruption) are apt to
arrive more swiftly than expected, and perhaps in more acute forms.
While this model suggests that extreme AI risks may be relatively
avoidable, it also emphasizes that such risks could arise more quickly
than expected.
I see important uncertainty in whether CAIS will be as fast and
efficient as agent AGI, and I don’t expect any easy resolution to that
uncertainty.
This paper is a good starting point, but we need someone to transform it
into something more rigorous.
CAIS is sufficiently similar to standard practices that it doesn’t
require much work to attempt it, and creates few risks.
I’m around 50% confident that CAIS plus a normal degree of vigilance by
AI developers will be sufficient to avoid global catastrophe from AI.
Drexler on AI Risk
Link post
Eric Drexler has published a book-length paper on AI risk, describing an approach that he calls Comprehensive AI Services (CAIS).
His primary goal seems to be reframing AI risk discussions to use a rather different paradigm than the one that Nick Bostrom and Eliezer Yudkowsky have been promoting. (There isn’t yet any paradigm that’s widely accepted, so this isn’t a Kuhnian paradigm shift; it’s better characterized as an amorphous field that is struggling to establish its first paradigm). Dueling paradigms seems to be the best that the AI safety field can manage to achieve for now.
I’ll start by mentioning some important claims that Drexler doesn’t dispute:
an intelligence explosion might happen somewhat suddenly, in the fairly near future;
it’s hard to reliably align an AI’s values with human values;
recursive self-improvement, as imagined by Bostrom / Yudkowsky, would pose significant dangers.
Drexler likely disagrees about some of the claims made by Bostrom / Yudkowsky on those points, but he shares enough of their concerns about them that those disagreements don’t explain why Drexler approaches AI safety differently. (Drexler is more cautious than most writers about making any predictions concerning these three claims).
CAIS isn’t a full solution to AI risks. Instead, it’s better thought of as an attempt to reduce the risk of world conquest by the first AGI that reaches some threshold, preserve existing corrigibility somewhat past human-level AI, and postpone need for a permanent solution until we have more intelligence.
Stop Anthropomorphising Intelligence!
What I see as the most important distinction between the CAIS paradigm and the Bostrom / Yudkowsky paradigm is Drexler’s objection to having advanced AI be a unified, general-purpose agent.
Intelligence doesn’t require a broad mind-like utility function. Mindspace is a small subset of the space of intelligence.
Instead, Drexler suggests composing broad AI systems out of many, diverse, narrower-purpose components. Normal software engineering produces components with goals that are limited to a specific output. Drexler claims there’s no need to add world-oriented goals that would cause a system to care about large parts of spacetime.
Systems built out of components with narrow goals don’t need to develop much broader goals. Existing trends in AI research suggest that better-than-human intelligence can be achieved via tools that have narrow goals.
Drexler’s main example of narrow goals is Google’s machine translation, which has no goals beyond translating the next unit of text. That doesn’t imply any obvious constraint on how sophisticated its world-model can be. It would be quite natural for AI progress continue with components whose “utility function” remains bounded like this.
It looks like this difference between narrow and broad goals can be turned into a fairly rigorous distinction, but I’m dissatisfied with available descriptions of the distinction. (I’d also like better names for them.)
There are lots of clear-cut cases: narrow-task software that just waits for commands, and on getting a command, it produces a result, then returns to its prior state; versus a general-purpose agent which is designed to maximize the price of a company’s stock.
But we need some narrow-task software to remember some information, and once we allow memory, it gets complicated to analyze whether the software’s goal is “narrow”.
Drexler seems less optimistic than I am about clarifying this distinction:
It may be true that a bright line can’t be explained clearly to laymen, but I have a strong intuition that machine learning (ML) developers will be able to explain it to each other well enough to agree on how to classify the cases that matter.
A Nanotech Analogy
Drexler originally described nanotechnology in terms of self-replicating machines.
Later, concerns about grey goo caused him to shift his recommendations toward a safer strategy, where no single machine would be able to replicate itself, but where the benefits of nanotechnology could be used recursively to improve nanofactories.
Similarly, some of the more science-fiction style analyses suggest that an AI with recursive self-improvement could quickly conquer the world.
Drexler’s CAIS proposal removes the “self-” from recursive self-improvement, in much the same way that nanofactories removed the “self-” from nanobot self-replication, replacing it with a more decentralized process that involves preserving more features of existing factories / AI implementations. The AI equivalent of nanofactories consists of a set of AI services, each with a narrow goal, which coordinate in ways that don’t qualify as a unified agent.
It sort of looks like Drexler’s nanotech background has had an important influence on his views. Eliezer’s somewhat conflicting view seems to follow a more science-fiction-like pattern of expecting one man to save (or destroy?) the world. And I could generate similar stories for mainstream AI researchers.
That doesn’t suggest much about who’s right, but it does suggest that people are being influenced by considerations that are only marginally relevant.
How Powerful is CAIS
Will CAIS be slower to develop than recursive self-improvement? Maybe. It depends somewhat on how fast recursive self-improvement is.
I’m uncertain whether to believe that human oversight is compatible with rapid development. Some of that uncertainty comes from confusion about what to compare it to (an agent AGI that needs no human feedback? or one that often asks humans for approval?).
Some people expect unified agents to be more powerful than CAIS. How plausible are their concerns?
Some of it is disagreement over the extent to which human-level AI will be built with currently understood techniques. (See Victoria Krakovna’s chart of what various people believe about this).
Could some of it be due to analogies to people? We have experience with some very agenty businessmen (e.g. Elon Musk or Bill Gates), and some bureaucracies made up of not-so-agenty employees (the post office, or Comcast). I’m tempted to use the intuitions I get from those examples to conclude that an unified agent AI will be more visionary and eager to improve. But I worry that doing so anthropomorphises intelligence in a way that misleads, since I can’t say anything more rigorous than “these patterns look relevant”.
But if that analogy doesn’t help, then the novelty of the situation hints we should distrust Drexler’s extrapolation from standard software practices (without placing much confidence in any alternative).
Cure Cancer Example
Drexler wants some limits on what gets automated. E.g. he wants to avoid a situation where an AI is told to cure cancer, and does so without further human interaction. That would risk generating a solution for which the system misjudges human approval (e.g. mind uploading or cryonic suspension).
Instead, he wants humans to decompose that into narrower goals (with substantial AI assistance), such that humans could verify that the goals are compatible with human welfare (or reject those that are too hard too evaluate).
This seems likely to delay cancer cures compared to what an agent AGI would do, maybe by hours, maybe by months, as the humans check the subtasks. I expect most people would accept such a delay as a reasonable price for reducing AI risks. I haven’t thought of a realistic example where I expect the delay would generate a strong incentive for using an agent AGI, but the cancer example is close enough to be unsettling.
This analysis is reassuring compared to Superintelligence, but not as reassuring as I’d like.
As I was writing the last few paragraphs, and thinking about Wei Dai’s objections, I found it hard to clearly model how CAIS would handle the cancer example.
Some of Wei Dai’s objections result from a disagreement about whether agent AGI has benefits. But his objections suggest other questions, for which I needed to think carefully in order to guess how Drexler would answer them: How much does CAIS depend on human judgment about what tasks to give to a service? Probably quite heavily, in some cases. How much does CAIS depend on the system having good estimates of human approval? Probably not too much, as long as experts are aware of how good those estimates are, and are willing and able to restrict access to some relatively risky high-level services.
I expect ML researchers can identify a safe way to use CAIS, but it doesn’t look very close to an idiot-proof framework, at least not without significant trial and error. I presume there will in the long run be a need for an idiot-proof interface to most such services, but I expect those to be developed later.
What Incentives will influence AI Developers?
With grey goo, it was pretty clear that most nanotech developers would clearly prefer the nanofactory approach, due to it being safer, and having few downsides.
With CAIS, the incentives are less clear, because it’s harder to tell whether there will be benefits to agent AGI’s.
Much depends on the controversial assumption that relatively responsible organizations will develop CAIS well before other entities are able to develop any form of equally powerful AI. I consider that plausible, but it seems to be one of the weakest parts of Drexler’s analysis.
If I knew that AI required expensive hardware, I might be confident that the first human-level AI’s would be developed at large, relatively risk-averse institutions.
But Drexler has a novel(?) approach (section 40) which suggests that existing supercomputers have about human-level raw computing power. That provides a reason for worrying that a wider variety of entities could develop powerful AI.
Drexler seems to extrapolate current trends, implying that the first entity to generate human-level AI will look like Google or OpenAI. Developers there seem likely to be sufficiently satisfied with the kind of intelligence explosion that CAIS seems likely to produce that it will only take moderate concern about risks to deter them from pursuing something more dangerous.
Whereas a poorly funded startup, or the stereotypical lone hacker in a basement, might be more tempted to gamble on an agent AGI. I have some hope that human-level AI will require a wide variety of service-like components, maybe too much for a small organization to handle. But I don’t like relying on that.
Presumably the publicly available AI services won’t be sufficiently general and powerful to enable random people to assemble them into an agent AGI? Combining a robocar + Google translate + an aircraft designer
a theorem prover doesn’t sound dangerous. Section 27.7 predicts that “senior human decision makers” would have access to a service with some strategic planning ability (which would have enough power to generate plans with dangerously broad goals), and they would likely restrict access to those high-level services. See also section 39.10 for why any one service doesn’t need to have a very broad purpose.
I’m unsure where Siri and Alexa fit in this framework. Their designers have some incentive to incorporate goals that extend well into the future, in order to better adapt to individual customers, by improving their models of each customers desires. I can imagine that being fully compatible with a CAIS approach, but I can also imagine them being given utility functions that would cause them to act quite agenty.
How Valuable is Modularity?
CAIS may be easier to develop, since modularity normally makes software development easier. On the other hand, modularity seems less important for ML. On the gripping hand, AI developers will likely be combining ML with other techniques, and modularity seems likely to be valuable for those systems, even if the ML parts are not modular. Section 37 lists examples of systems composed of both ML and traditional software.
How much less important is modularity for ML? A typical ML system seems to do plenty of re-learning from scratch, when we could imagine it delegating tasks to other components. On the other hand, ML developers seem to be fairly strongly sticking to the pattern of assigning only narrow goals to any instance of an ML service, typically using high-level human judgment to integrate that with other parts.
I expect robocars to provide a good test of how much ML is pushing software development away from modularity. I’d expect if CAIS is generally correct, a robocar would have more than 10 independently trained ML modules integrated into the main software that does the driving, whereas I’d expect less than 10 if Drexler were wrong about modularity. My cursory search did not find any clear answer—can anyone resolve this?
I suspect that most ML literature tends to emphasize monolithic software because that’s easier to understand, and because those papers focus on specific new ML features, to which modularity is not very relevant.
Maybe there’s a useful analogy to markets—maybe people underestimate CAIS because very decentralized systems are harder for people to model. People often imagine that decentralized markets are less efficient that centralized command and control, and only seem to tolerate markets after seeing lots of evidence (e.g. the collapse of communism). On the other hand, Eliezer and Bostrom don’t seem especially prone to underestimate markets, so I have low confidence that this guess explains much.
Alas, skepticism of decentralized systems might mean that we’re doomed to learn the hard way that the same principles apply to AI development (or fail to learn, because we don’t survive the first mistake).
Transparency?
MIRI has been worrying about the opaqueness of neural nets and similar approaches to AI, because it’s hard to evaluate the safety of a large, opaque system. I suspect that complex world-models are inherently hard to analyze. So I’d be rather pessimistic if I thought we needed the kind of transparency that MIRI hopes for.
Drexler points out that opaqueness causes fewer problems under the CAIS paradigm. Individual components may often be pretty opaque, but interactions between components seem more likely to follow a transparent protocol (assuming designers value that). And as long as the opaque components have sufficiently limited goals, the risks that might hide under that opaqueness are constrained.
Transparent protocols enable faster development by humans, but I’m concerned that it will be even faster to have AI’s generating systems with less transparent protocols.
Implications
The differences between CAIS and agent AGI ought to define a threshold, which could function as a fire alarm for AI experts. If AI developers need to switch to broad utility functions in order to compete, that will provide a clear sign that AI risks are high, and that something’s wrong with the CAIS paradigm.
CAIS indicates that it’s important to have a consortium of AI companies to promote safety guidelines, and to propagate a consensus view on how to stay on the safe side of the narrow versus broad task threshold.
CAIS helps reduce the pressure to classify typical AI research as dangerous, and therefore reduces AI researcher’s motivation to resist AI safety research.
Some implications for AI safety researchers in general: don’t imply that anyone knows whether recursive self-improvement will beat other forms of recursive improvement. We don’t want to tempt AI researchers to try recursive self-improvement (by telling people it’s much more powerful). And we don’t want to err much in the other direction, because we don’t want people to be complacent about the risks of recursive self-improvement.
Conclusion
CAIS seems somewhat more grounded in existing software practices than, say, the paradigm used in Superintelligence, and provides more reasons for hope. Yet it provides little reason for complacency:
I see important uncertainty in whether CAIS will be as fast and efficient as agent AGI, and I don’t expect any easy resolution to that uncertainty.
This paper is a good starting point, but we need someone to transform it into something more rigorous.
CAIS is sufficiently similar to standard practices that it doesn’t require much work to attempt it, and creates few risks.
I’m around 50% confident that CAIS plus a normal degree of vigilance by AI developers will be sufficient to avoid global catastrophe from AI.