I suspect he would claim that quickly building an AGI would not allow you to take over the world, because the AGI would not be that much more capable than the CAIS service cluster.
That does not seem to be his position though, because if AGI is not much more capable than CAIS, then there would be no need to talk specifically about how to defend the world against AGI, as he does at length in section 32. If that was his position, he could just talk about how ordinary policing and military defense would work in a CAIS world (i.e., against human adversaries wielding CAIS) and say that the same policing/defense would also work against AGI because AGI is not much more capable than CAIS.
Instead it seems clear that he thinks AGI requires special effort to defend against, which is made possible by a delay between SI-level CAIS and AGI, which he proposes that we use to do a very extensive “unopposed preparation”. I’ve been trying to figure out why he thinks there will be such a delay and my current best guess is “Implementation of the AGI model
is widely regarded as requiring conceptual breakthroughs.” (page 75) which he repeats on page 77, “AGI (but not CAIS)
calls for conceptual breakthroughs to enable both
implementation and subsequent safe application.” I don’t understand why he thinks such conceptual breakthroughs will be required though. Why couldn’t someone just take some appropriate AI services, connect them together in a straightforward way, and end up with an AGI? Do you get it? Or am I on the wrong track here?
I doubt I will ever be able to confidently answer yes to that question.
That does not seem to be his position though, because if AGI is not much more capable than CAIS, then there would be no need to talk specifically about how to defend the world against AGI, as he does at length in section 32.
My model is that he does think AGI won’t be much more capable than CAIS (see sections 12 and 13 in particular, and 10, 11 and 16 also touch on the topic), but lots of people (including me) kept making the argument that end-to-end training tends to improve performance and so AGI would outperform CAIS, and so he decided to write a response to that.
In general, my impression from talking to him and reading earlier drafts is that the earlier chapters are representative of his core models, while the later chapters are more like responses to particular arguments, or specific implications of those models.
I can give one positive argument for AGI being harder to make than SI-level CAIS. All of our current techniques for building AI systems create things that are bounded in the time horizon they are optimizing over. It’s actually quite unclear how we would use current techniques to get something that does very-long-term-planning. (This could be the “conceptual breakthroughs” point.) Seems a lot easier to get a bunch of bounded services and hook them up together in such a way that they can do the sorts of things that AGI agents could do.
The one scenario that is both concrete and somewhat plausible to me is that we run powerful deep RL on a very complex environment, and this finds an agent that does very-long-term-planning, because that’s what it takes to do well on the environment. I don’t know what Eric thinks about this scenario, but it doesn’t seem to influence his thinking very much (and in fact in the OP I argued that CAIS isn’t engaging enough with this scenario).
Why couldn’t someone just take some appropriate AI services, connect them together in a straightforward way, and end up with an AGI?
If you take a bunch of a bounded services and connect them together in some straightforward way, you wouldn’t get something that is optimizing over the long term. Where did the long term optimization come from?
For example, you could take any long term task and break it down into the “plan maker” which thinks for an hour and gives a plan for the task, and the “plan executor” which takes an in-progress plan and executes the next step. Both of these are bounded and so could be services, and their combination is generally intelligent, but the combination wouldn’t have convergent instrumental subgoals.
Thanks, I think this is helpful for me to understand Eric’s model better, but I’m still pretty confused.
It’s actually quite unclear how we would use current techniques to get something that does very-long-term-planning. (This could be the “conceptual breakthroughs” point.)
But it’s quite unclear how to use current techniques to do a lot of things. Why should we expect that this conceptual breakthrough would come later than other conceptual breakthroughs needed to achieve CAIS? (Given your disagreement with Eric on this, I guess this is more a question for him than for you.)
Where did the long term optimization come from?
I was assuming that long term strategic planners (as described in section 27) are available as an AIS, and would be one of the components of the hypothetical AGI.
For example, you could take any long term task and break it down into the “plan maker” which thinks for an hour and gives a plan for the task, and the “plan executor” which takes an in-progress plan and executes the next step. Both of these are bounded and so could be services, and their combination is generally intelligent, but the combination wouldn’t have convergent instrumental subgoals.
I don’t see why it wouldn’t, unless these services are specifically designed to be corrigible (in which case the “corrigible” part seems much more important than the “service” part). For example, suppose you asked the plan maker to create a plan to cure cancer. Why would the mere fact that it’s a bounded service prevent it from coming up with a plan that involves causing human extinction (and a bunch of convergent instrumental subgoals like deceiving humans who might stop it)? (If there was a human in the loop, then you could look at the plan and reject it, but I’m imagining that someone, in order to build an AGI as quickly and efficiently as possible, stripped off the “optimize for human consumption” part of the strategic planner and instead optimized it to produce plans for direct machine consumption.)
Why should we expect that this conceptual breakthrough would come later than other conceptual breakthroughs needed to achieve CAIS?
I think I share Eric’s intuition that this problem is hard in a more fundamental way than other things, but I don’t really know why I have this intuition. Some potential generators:
ML systems seem to be really good at learning tasks, but really bad at learning explicit reasoning. I think of CAIS as being on the side of “we never figure out explicit reasoning at the level that humans do it”, and making up for this deficit by having good simulators that allow us to learn from experience, or by collecting much more data across multiple instances of AI systems, or by trying out many different AI designs and choosing the one which performs best.
It seems like humans tend to build systems by making individual parts that we can understand and predict well, and putting those together in a way where we can make some guarantees/predictions about what will happen. CAIS plays to this strength, whereas “figure out how to do very-long-term-planning” doesn’t.
I don’t see why it wouldn’t, unless these services are specifically designed to be corrigible (in which case the “corrigible” part seems much more important than the “service” part).
Yeah, you’re right, I definitely said the wrong thing there. I guess the difference is that the convergent instrumental subgoals are now “one level up”—they aren’t subgoals of the AI service itself, they’re subgoals of the plan that was created by the AI service. It feels like this is qualitatively different and easier to address, but I can’t really say why. More generators:
In this setting, convergent instrumental subgoals happen only if the plan-making service is told to maximize outcomes. However, since it’s one level up, it should be easier to ask for something that says something more like “do X, interpreted pragmatically and not literally”.
Things that happen one level up in the CAIS world are easier to point at and more interpretable, so it should be easier to find and fix issues of this sort.
(You could of course say “just because it’s easier that doesn’t mean people will do it”, but I could imagine that if its easy enough this becomes best practice and people do it by default, and you don’t actually gain very much by taking these parts out.)
I was assuming that long term strategic planners (as described in section 27) are available as an AIS, and would be one of the components of the hypothetical AGI.
Yeah, here also what I should have said is that the long term optimization is happening one level up, whereas with the typical AGI agent scenario it feels like the long term optimization needs to happen at the base level, and that’s the thing we don’t know how to do.
Unfortunately, I only vaguely understand the points that you’re trying to make in this comment… Would it be fair to just say at this point that this is an important crux that Eric failed to convincingly argue for?
I agree that it’s an important crux, and that the arguments are not sufficiently strong that everyone should believe Eric’s position. I do think that he has provided arguments that support his position, though they are in a different language/ontology than is usually used here.
Ah, ok, what sections would you suggest that I (re)read to understand his arguments better? (You mentioned 12, 13, 10, 11 and 16 earlier in this thread but back then we were talking about “AGI won’t be much more capable than CAIS” and here the topic is whether we should expect AGI to come later than CAIS or require harder conceptual breakthroughs.)
I quickly skimmed the table of contents to generate this list, so it might have both false positives and false negatives.
Section 1: We typically make progress using R&D processes; this can get us to superintelligence. Implicitly also makes the claim that this is qualitatively different from AGI, though doesn’t really argue for that.
Section 8: Optimization pressure points away from generality, not towards it, which suggests that strong optimization pressure doesn’t give you AGI.
Section 12.6: AGI and CAIS solve problems in different ways. (Combined with the claim, argued elsewhere: CAIS will happen first.)
Section 13: AGI agents are more complex. (Implicit claim: and so harder to build.)
Section 17: Most complex tasks involve several different subtasks that don’t interact much; so you get efficiency and generality gains by splitting the subtasks up into separate services.
Section 38: Division of labor + specialization are useful for good performance.
Most of these sections seem to only contain arguments that AGI won’t come earlier than CAIS, but not that it would come later than CAIS. In other words, they don’t argue against the likelihood that under CAIS someone can easily build an AGI by connecting existing AI services together in a straightforward way. The only section I can find among the ones you listed that tries to argue in this direction is Section 13, but even it mostly just argues that AGI isn’t simpler than CAIS, and not that it’s more complex, except for this paragraph in the summary, Section 13.5:
To summarize, in each of the areas outlined above, the classic AGI model
both obscures and increases complexity: In order for general learning and
capabilities to fit a classic AGI model, they must not only exist, but must be
integrated into a single, autonomous, self-modifying agent. Further, achieving
this kind of integration would increase, not reduce, the challenges of aligning
AI behaviors with human goals: These challenges become more difficult when
the goals of a single agent must motivate all (and only) useful tasks.
So putting alignment aside (I’m assuming that someone would be willing to build an unaligned AGI if it’s easy enough), the only argument Eric gives for greater complexity of AGI vs CAIS is “must be integrated into a single, autonomous, self-modifying agent”, but why should this integration add a non-negligible amount of complexity? Why can’t someone just take a plan maker, connect it to a plan executer, and connect that to the Internet to access other services as needed? (I think your argument that strategic planning may be one of the last AIS to arrive is plausible, but it doesn’t seem to be an argument that Eric himself makes.) Where is the additional complexity coming from?
Why can’t someone just take a plan maker, connect it to a plan executer, and connect that to the Internet to access other services as needed?
I think Eric would not call that an AGI agent.
Setting aside what Eric thinks and talking about what I think: There is one conception of “AGI risk” where the problem is that you have an integrated system that has optimization pressure applied to the system as a whole (similar to end-to-end training) such that the entire system is “pointed at” a particular goal and uses all of its intelligence towards that. The goal is a long-term goal over universe-histories. The agent can be modeled as literally actually maximizing the goal. These are all properties of the AGI itself.
With the system you described, there is no end-to-end training, and it doesn’t seem right to say that the overall system is aimed at a long-term goal, since it depends on what you ask the plan maker to do. I agree this does not clearly solve any major problem, but it does seem markedly different to me.
I think that Eric’s conception of “AGI agent” is like the first thing I described. I agree that this is not what everyone means by “AGI”, and it is particularly not the thing you mean by “AGI”.
You might argue that there seems to be no effective safety difference between an Eric-AGI-agent and the plan maker + plan executor. The main differences seem to be about what safety mechanisms you can add—such as looking at the generated plan, or using human models of approval to check that you have the right goal. (Whereas an Eric-AGI-agent is so opaque that you can’t look at things like “generated plans”, and you can’t check that you have the right goal because the Eric-AGI-agent will not let you change its goal.)
With an Eric-AGI-agent, if you try to create a human model of approval, that would need to be an Eric-AGI-agent itself in order to effectively supervise the first Eric-AGI-agent, but in that case the model of approval will be literally actually maximizing some goal like “be as accurate as possible”, which will lead to perverse behavior like manipulating humans so that what they approve is easier to predict. In CAIS, this doesn’t happen, because the approval model is not searching over possibilities that involve manipulating humans.
I was assuming that long term strategic planners (as described in section 27) are available as an AIS, and would be one of the components of the hypothetical AGI.
That’s not consistent with my understanding of section 27. My understanding is that Drexler would describe that as too dangerous.
suppose you asked the plan maker to create a plan to cure cancer.
I suspect that a problem here is that “plan maker” is ambiguous as to whether it falls within Drexler’s notion of something with a bounded goal.
CAIS isn’t just a way to structure software. It also requires some not-yet-common sense about what goals to give the software.
“Cure cancer” seems too broad to qualify as a goal that Drexler would consider safe to give to software. Sections 27 and 28 suggest that Drexler wants humans to break that down into narrower subtasks. E.g. he says:
By contrast, it is difficult to envision a development path in which AI developers would treat all aspects of biomedical research (or even cancer research) as a single task to be learned and implemented by a generic system.
After further rereading, I now think that what Drexler imagines is a bit more complex: (section 27.7) “senior human decision makers” would have access to a service with some strategic planning ability (which would have enough power to generate plans with dangerously broad goals), and they would likely restrict access to those high-level services.
I suspect Drexler is deliberately vague about the extent to which the strategic planning services will contain safeguards.
This, of course, depends on the controversial assumption that relatively responsible organizations will develop CAIS well before other entities are able to develop any form of equally powerful AI. I consider that plausible, but it seems to be one of the weakest parts of his analysis.
And presumably the publicly available AI services won’t be sufficiently general and powerful to enable random people to assemble them into an agent AGI? Combining a robocar + Google translate + an aircraft designer + a theorem prover doesn’t sound dangerous. But I’d prefer to have something more convincing than just “I spent a few minutes looking for risks, and didn’t find any”.
Fwiw, by my understanding of CAIS and my definition of a service here as “A service is an AI system that delivers bounded results for some task using bounded resources in bounded time”, a plan maker would qualify as a service. So every time I make claims about “services” I intend for those claims to apply to plan makers as well.
I have tried to use words the same way that Drexler does, but obviously I can’t know exactly what he meant.
That does not seem to be his position though, because if AGI is not much more capable than CAIS, then there would be no need to talk specifically about how to defend the world against AGI, as he does at length in section 32. If that was his position, he could just talk about how ordinary policing and military defense would work in a CAIS world (i.e., against human adversaries wielding CAIS) and say that the same policing/defense would also work against AGI because AGI is not much more capable than CAIS.
Instead it seems clear that he thinks AGI requires special effort to defend against, which is made possible by a delay between SI-level CAIS and AGI, which he proposes that we use to do a very extensive “unopposed preparation”. I’ve been trying to figure out why he thinks there will be such a delay and my current best guess is “Implementation of the AGI model is widely regarded as requiring conceptual breakthroughs.” (page 75) which he repeats on page 77, “AGI (but not CAIS) calls for conceptual breakthroughs to enable both implementation and subsequent safe application.” I don’t understand why he thinks such conceptual breakthroughs will be required though. Why couldn’t someone just take some appropriate AI services, connect them together in a straightforward way, and end up with an AGI? Do you get it? Or am I on the wrong track here?
I doubt I will ever be able to confidently answer yes to that question.
My model is that he does think AGI won’t be much more capable than CAIS (see sections 12 and 13 in particular, and 10, 11 and 16 also touch on the topic), but lots of people (including me) kept making the argument that end-to-end training tends to improve performance and so AGI would outperform CAIS, and so he decided to write a response to that.
In general, my impression from talking to him and reading earlier drafts is that the earlier chapters are representative of his core models, while the later chapters are more like responses to particular arguments, or specific implications of those models.
I can give one positive argument for AGI being harder to make than SI-level CAIS. All of our current techniques for building AI systems create things that are bounded in the time horizon they are optimizing over. It’s actually quite unclear how we would use current techniques to get something that does very-long-term-planning. (This could be the “conceptual breakthroughs” point.) Seems a lot easier to get a bunch of bounded services and hook them up together in such a way that they can do the sorts of things that AGI agents could do.
The one scenario that is both concrete and somewhat plausible to me is that we run powerful deep RL on a very complex environment, and this finds an agent that does very-long-term-planning, because that’s what it takes to do well on the environment. I don’t know what Eric thinks about this scenario, but it doesn’t seem to influence his thinking very much (and in fact in the OP I argued that CAIS isn’t engaging enough with this scenario).
If you take a bunch of a bounded services and connect them together in some straightforward way, you wouldn’t get something that is optimizing over the long term. Where did the long term optimization come from?
For example, you could take any long term task and break it down into the “plan maker” which thinks for an hour and gives a plan for the task, and the “plan executor” which takes an in-progress plan and executes the next step. Both of these are bounded and so could be services, and their combination is generally intelligent, but the combination wouldn’t have convergent instrumental subgoals.
Thanks, I think this is helpful for me to understand Eric’s model better, but I’m still pretty confused.
But it’s quite unclear how to use current techniques to do a lot of things. Why should we expect that this conceptual breakthrough would come later than other conceptual breakthroughs needed to achieve CAIS? (Given your disagreement with Eric on this, I guess this is more a question for him than for you.)
I was assuming that long term strategic planners (as described in section 27) are available as an AIS, and would be one of the components of the hypothetical AGI.
I don’t see why it wouldn’t, unless these services are specifically designed to be corrigible (in which case the “corrigible” part seems much more important than the “service” part). For example, suppose you asked the plan maker to create a plan to cure cancer. Why would the mere fact that it’s a bounded service prevent it from coming up with a plan that involves causing human extinction (and a bunch of convergent instrumental subgoals like deceiving humans who might stop it)? (If there was a human in the loop, then you could look at the plan and reject it, but I’m imagining that someone, in order to build an AGI as quickly and efficiently as possible, stripped off the “optimize for human consumption” part of the strategic planner and instead optimized it to produce plans for direct machine consumption.)
I think I share Eric’s intuition that this problem is hard in a more fundamental way than other things, but I don’t really know why I have this intuition. Some potential generators:
ML systems seem to be really good at learning tasks, but really bad at learning explicit reasoning. I think of CAIS as being on the side of “we never figure out explicit reasoning at the level that humans do it”, and making up for this deficit by having good simulators that allow us to learn from experience, or by collecting much more data across multiple instances of AI systems, or by trying out many different AI designs and choosing the one which performs best.
It seems like humans tend to build systems by making individual parts that we can understand and predict well, and putting those together in a way where we can make some guarantees/predictions about what will happen. CAIS plays to this strength, whereas “figure out how to do very-long-term-planning” doesn’t.
Yeah, you’re right, I definitely said the wrong thing there. I guess the difference is that the convergent instrumental subgoals are now “one level up”—they aren’t subgoals of the AI service itself, they’re subgoals of the plan that was created by the AI service. It feels like this is qualitatively different and easier to address, but I can’t really say why. More generators:
In this setting, convergent instrumental subgoals happen only if the plan-making service is told to maximize outcomes. However, since it’s one level up, it should be easier to ask for something that says something more like “do X, interpreted pragmatically and not literally”.
Things that happen one level up in the CAIS world are easier to point at and more interpretable, so it should be easier to find and fix issues of this sort.
(You could of course say “just because it’s easier that doesn’t mean people will do it”, but I could imagine that if its easy enough this becomes best practice and people do it by default, and you don’t actually gain very much by taking these parts out.)
Yeah, here also what I should have said is that the long term optimization is happening one level up, whereas with the typical AGI agent scenario it feels like the long term optimization needs to happen at the base level, and that’s the thing we don’t know how to do.
Unfortunately, I only vaguely understand the points that you’re trying to make in this comment… Would it be fair to just say at this point that this is an important crux that Eric failed to convincingly argue for?
I agree that it’s an important crux, and that the arguments are not sufficiently strong that everyone should believe Eric’s position. I do think that he has provided arguments that support his position, though they are in a different language/ontology than is usually used here.
Ah, ok, what sections would you suggest that I (re)read to understand his arguments better? (You mentioned 12, 13, 10, 11 and 16 earlier in this thread but back then we were talking about “AGI won’t be much more capable than CAIS” and here the topic is whether we should expect AGI to come later than CAIS or require harder conceptual breakthroughs.)
I quickly skimmed the table of contents to generate this list, so it might have both false positives and false negatives.
Section 1: We typically make progress using R&D processes; this can get us to superintelligence. Implicitly also makes the claim that this is qualitatively different from AGI, though doesn’t really argue for that.
Section 8: Optimization pressure points away from generality, not towards it, which suggests that strong optimization pressure doesn’t give you AGI.
Section 12.6: AGI and CAIS solve problems in different ways. (Combined with the claim, argued elsewhere: CAIS will happen first.)
Section 13: AGI agents are more complex. (Implicit claim: and so harder to build.)
Section 17: Most complex tasks involve several different subtasks that don’t interact much; so you get efficiency and generality gains by splitting the subtasks up into separate services.
Section 38: Division of labor + specialization are useful for good performance.
Most of these sections seem to only contain arguments that AGI won’t come earlier than CAIS, but not that it would come later than CAIS. In other words, they don’t argue against the likelihood that under CAIS someone can easily build an AGI by connecting existing AI services together in a straightforward way. The only section I can find among the ones you listed that tries to argue in this direction is Section 13, but even it mostly just argues that AGI isn’t simpler than CAIS, and not that it’s more complex, except for this paragraph in the summary, Section 13.5:
So putting alignment aside (I’m assuming that someone would be willing to build an unaligned AGI if it’s easy enough), the only argument Eric gives for greater complexity of AGI vs CAIS is “must be integrated into a single, autonomous, self-modifying agent”, but why should this integration add a non-negligible amount of complexity? Why can’t someone just take a plan maker, connect it to a plan executer, and connect that to the Internet to access other services as needed? (I think your argument that strategic planning may be one of the last AIS to arrive is plausible, but it doesn’t seem to be an argument that Eric himself makes.) Where is the additional complexity coming from?
I think Eric would not call that an AGI agent.
Setting aside what Eric thinks and talking about what I think: There is one conception of “AGI risk” where the problem is that you have an integrated system that has optimization pressure applied to the system as a whole (similar to end-to-end training) such that the entire system is “pointed at” a particular goal and uses all of its intelligence towards that. The goal is a long-term goal over universe-histories. The agent can be modeled as literally actually maximizing the goal. These are all properties of the AGI itself.
With the system you described, there is no end-to-end training, and it doesn’t seem right to say that the overall system is aimed at a long-term goal, since it depends on what you ask the plan maker to do. I agree this does not clearly solve any major problem, but it does seem markedly different to me.
I think that Eric’s conception of “AGI agent” is like the first thing I described. I agree that this is not what everyone means by “AGI”, and it is particularly not the thing you mean by “AGI”.
You might argue that there seems to be no effective safety difference between an Eric-AGI-agent and the plan maker + plan executor. The main differences seem to be about what safety mechanisms you can add—such as looking at the generated plan, or using human models of approval to check that you have the right goal. (Whereas an Eric-AGI-agent is so opaque that you can’t look at things like “generated plans”, and you can’t check that you have the right goal because the Eric-AGI-agent will not let you change its goal.)
With an Eric-AGI-agent, if you try to create a human model of approval, that would need to be an Eric-AGI-agent itself in order to effectively supervise the first Eric-AGI-agent, but in that case the model of approval will be literally actually maximizing some goal like “be as accurate as possible”, which will lead to perverse behavior like manipulating humans so that what they approve is easier to predict. In CAIS, this doesn’t happen, because the approval model is not searching over possibilities that involve manipulating humans.
That’s not consistent with my understanding of section 27. My understanding is that Drexler would describe that as too dangerous.
I suspect that a problem here is that “plan maker” is ambiguous as to whether it falls within Drexler’s notion of something with a bounded goal.
CAIS isn’t just a way to structure software. It also requires some not-yet-common sense about what goals to give the software.
“Cure cancer” seems too broad to qualify as a goal that Drexler would consider safe to give to software. Sections 27 and 28 suggest that Drexler wants humans to break that down into narrower subtasks. E.g. he says:
After further rereading, I now think that what Drexler imagines is a bit more complex: (section 27.7) “senior human decision makers” would have access to a service with some strategic planning ability (which would have enough power to generate plans with dangerously broad goals), and they would likely restrict access to those high-level services.
I suspect Drexler is deliberately vague about the extent to which the strategic planning services will contain safeguards.
This, of course, depends on the controversial assumption that relatively responsible organizations will develop CAIS well before other entities are able to develop any form of equally powerful AI. I consider that plausible, but it seems to be one of the weakest parts of his analysis.
And presumably the publicly available AI services won’t be sufficiently general and powerful to enable random people to assemble them into an agent AGI? Combining a robocar + Google translate + an aircraft designer + a theorem prover doesn’t sound dangerous. But I’d prefer to have something more convincing than just “I spent a few minutes looking for risks, and didn’t find any”.
Fwiw, by my understanding of CAIS and my definition of a service here as “A service is an AI system that delivers bounded results for some task using bounded resources in bounded time”, a plan maker would qualify as a service. So every time I make claims about “services” I intend for those claims to apply to plan makers as well.
I have tried to use words the same way that Drexler does, but obviously I can’t know exactly what he meant.