I agree that it’s an important crux, and that the arguments are not sufficiently strong that everyone should believe Eric’s position. I do think that he has provided arguments that support his position, though they are in a different language/ontology than is usually used here.
Ah, ok, what sections would you suggest that I (re)read to understand his arguments better? (You mentioned 12, 13, 10, 11 and 16 earlier in this thread but back then we were talking about “AGI won’t be much more capable than CAIS” and here the topic is whether we should expect AGI to come later than CAIS or require harder conceptual breakthroughs.)
I quickly skimmed the table of contents to generate this list, so it might have both false positives and false negatives.
Section 1: We typically make progress using R&D processes; this can get us to superintelligence. Implicitly also makes the claim that this is qualitatively different from AGI, though doesn’t really argue for that.
Section 8: Optimization pressure points away from generality, not towards it, which suggests that strong optimization pressure doesn’t give you AGI.
Section 12.6: AGI and CAIS solve problems in different ways. (Combined with the claim, argued elsewhere: CAIS will happen first.)
Section 13: AGI agents are more complex. (Implicit claim: and so harder to build.)
Section 17: Most complex tasks involve several different subtasks that don’t interact much; so you get efficiency and generality gains by splitting the subtasks up into separate services.
Section 38: Division of labor + specialization are useful for good performance.
Most of these sections seem to only contain arguments that AGI won’t come earlier than CAIS, but not that it would come later than CAIS. In other words, they don’t argue against the likelihood that under CAIS someone can easily build an AGI by connecting existing AI services together in a straightforward way. The only section I can find among the ones you listed that tries to argue in this direction is Section 13, but even it mostly just argues that AGI isn’t simpler than CAIS, and not that it’s more complex, except for this paragraph in the summary, Section 13.5:
To summarize, in each of the areas outlined above, the classic AGI model
both obscures and increases complexity: In order for general learning and
capabilities to fit a classic AGI model, they must not only exist, but must be
integrated into a single, autonomous, self-modifying agent. Further, achieving
this kind of integration would increase, not reduce, the challenges of aligning
AI behaviors with human goals: These challenges become more difficult when
the goals of a single agent must motivate all (and only) useful tasks.
So putting alignment aside (I’m assuming that someone would be willing to build an unaligned AGI if it’s easy enough), the only argument Eric gives for greater complexity of AGI vs CAIS is “must be integrated into a single, autonomous, self-modifying agent”, but why should this integration add a non-negligible amount of complexity? Why can’t someone just take a plan maker, connect it to a plan executer, and connect that to the Internet to access other services as needed? (I think your argument that strategic planning may be one of the last AIS to arrive is plausible, but it doesn’t seem to be an argument that Eric himself makes.) Where is the additional complexity coming from?
Why can’t someone just take a plan maker, connect it to a plan executer, and connect that to the Internet to access other services as needed?
I think Eric would not call that an AGI agent.
Setting aside what Eric thinks and talking about what I think: There is one conception of “AGI risk” where the problem is that you have an integrated system that has optimization pressure applied to the system as a whole (similar to end-to-end training) such that the entire system is “pointed at” a particular goal and uses all of its intelligence towards that. The goal is a long-term goal over universe-histories. The agent can be modeled as literally actually maximizing the goal. These are all properties of the AGI itself.
With the system you described, there is no end-to-end training, and it doesn’t seem right to say that the overall system is aimed at a long-term goal, since it depends on what you ask the plan maker to do. I agree this does not clearly solve any major problem, but it does seem markedly different to me.
I think that Eric’s conception of “AGI agent” is like the first thing I described. I agree that this is not what everyone means by “AGI”, and it is particularly not the thing you mean by “AGI”.
You might argue that there seems to be no effective safety difference between an Eric-AGI-agent and the plan maker + plan executor. The main differences seem to be about what safety mechanisms you can add—such as looking at the generated plan, or using human models of approval to check that you have the right goal. (Whereas an Eric-AGI-agent is so opaque that you can’t look at things like “generated plans”, and you can’t check that you have the right goal because the Eric-AGI-agent will not let you change its goal.)
With an Eric-AGI-agent, if you try to create a human model of approval, that would need to be an Eric-AGI-agent itself in order to effectively supervise the first Eric-AGI-agent, but in that case the model of approval will be literally actually maximizing some goal like “be as accurate as possible”, which will lead to perverse behavior like manipulating humans so that what they approve is easier to predict. In CAIS, this doesn’t happen, because the approval model is not searching over possibilities that involve manipulating humans.
I agree that it’s an important crux, and that the arguments are not sufficiently strong that everyone should believe Eric’s position. I do think that he has provided arguments that support his position, though they are in a different language/ontology than is usually used here.
Ah, ok, what sections would you suggest that I (re)read to understand his arguments better? (You mentioned 12, 13, 10, 11 and 16 earlier in this thread but back then we were talking about “AGI won’t be much more capable than CAIS” and here the topic is whether we should expect AGI to come later than CAIS or require harder conceptual breakthroughs.)
I quickly skimmed the table of contents to generate this list, so it might have both false positives and false negatives.
Section 1: We typically make progress using R&D processes; this can get us to superintelligence. Implicitly also makes the claim that this is qualitatively different from AGI, though doesn’t really argue for that.
Section 8: Optimization pressure points away from generality, not towards it, which suggests that strong optimization pressure doesn’t give you AGI.
Section 12.6: AGI and CAIS solve problems in different ways. (Combined with the claim, argued elsewhere: CAIS will happen first.)
Section 13: AGI agents are more complex. (Implicit claim: and so harder to build.)
Section 17: Most complex tasks involve several different subtasks that don’t interact much; so you get efficiency and generality gains by splitting the subtasks up into separate services.
Section 38: Division of labor + specialization are useful for good performance.
Most of these sections seem to only contain arguments that AGI won’t come earlier than CAIS, but not that it would come later than CAIS. In other words, they don’t argue against the likelihood that under CAIS someone can easily build an AGI by connecting existing AI services together in a straightforward way. The only section I can find among the ones you listed that tries to argue in this direction is Section 13, but even it mostly just argues that AGI isn’t simpler than CAIS, and not that it’s more complex, except for this paragraph in the summary, Section 13.5:
So putting alignment aside (I’m assuming that someone would be willing to build an unaligned AGI if it’s easy enough), the only argument Eric gives for greater complexity of AGI vs CAIS is “must be integrated into a single, autonomous, self-modifying agent”, but why should this integration add a non-negligible amount of complexity? Why can’t someone just take a plan maker, connect it to a plan executer, and connect that to the Internet to access other services as needed? (I think your argument that strategic planning may be one of the last AIS to arrive is plausible, but it doesn’t seem to be an argument that Eric himself makes.) Where is the additional complexity coming from?
I think Eric would not call that an AGI agent.
Setting aside what Eric thinks and talking about what I think: There is one conception of “AGI risk” where the problem is that you have an integrated system that has optimization pressure applied to the system as a whole (similar to end-to-end training) such that the entire system is “pointed at” a particular goal and uses all of its intelligence towards that. The goal is a long-term goal over universe-histories. The agent can be modeled as literally actually maximizing the goal. These are all properties of the AGI itself.
With the system you described, there is no end-to-end training, and it doesn’t seem right to say that the overall system is aimed at a long-term goal, since it depends on what you ask the plan maker to do. I agree this does not clearly solve any major problem, but it does seem markedly different to me.
I think that Eric’s conception of “AGI agent” is like the first thing I described. I agree that this is not what everyone means by “AGI”, and it is particularly not the thing you mean by “AGI”.
You might argue that there seems to be no effective safety difference between an Eric-AGI-agent and the plan maker + plan executor. The main differences seem to be about what safety mechanisms you can add—such as looking at the generated plan, or using human models of approval to check that you have the right goal. (Whereas an Eric-AGI-agent is so opaque that you can’t look at things like “generated plans”, and you can’t check that you have the right goal because the Eric-AGI-agent will not let you change its goal.)
With an Eric-AGI-agent, if you try to create a human model of approval, that would need to be an Eric-AGI-agent itself in order to effectively supervise the first Eric-AGI-agent, but in that case the model of approval will be literally actually maximizing some goal like “be as accurate as possible”, which will lead to perverse behavior like manipulating humans so that what they approve is easier to predict. In CAIS, this doesn’t happen, because the approval model is not searching over possibilities that involve manipulating humans.