Eric and I have exchanged a few emails since I posted this summary, I’m posting some of it here (with his permission), edited by me for conciseness and clarity. The paragraphs in the quotes are Eric’s, but I have rearranged his paragraphs and omitted some of them for better flow in this comment.
There is a widespread intuition that AGI agents would by nature be more integrated, flexible, or efficient than comparable AI services. I am persuaded that this is wrong, and stems from an illusion of simplicity that results from hiding mechanism in a conceptually opaque box, a point that is argued at some length in Section 13.
Overall, I think that many of us have been in the habit of seeing flexible optimization itself as problem, when optimization is instead (in the typical case) a strong constraint on a system’s behavior (see Section 8). Flexibility of computation in pursuit of optimization for bounded tasks seems simply useful, regardless of planning horizon, scope of considerations, or scope of required knowledge.
I agree that AGI agents hide mechanism in an opaque box. I also agree that the sort of optimization that current ML does, which is very task-focused, is a strong constraint on behavior. There seems to be a different sort of optimization that humans are capable of, where we can enter a new domain and perform well in it very quickly; I don’t have a good understanding of that sort of optimization, and I think that’s what the classic AGI agent risks are about.
Relatedly, I’ve used the words “monolithic AGI agent” a bunch in the summary and the post. Now, I want to instead talk about whether AI systems will be opaque and well-integrated, since that’s the main crux of our disagreement. It’s plausible to me that even if they are opaque and well-integrated, you don’t get the classic AGI agent risks, because you don’t get the sort of optimization I was talking about above.
In this connection, you cite the power of end-to-end training, but Section 17.4 (“General capabilities comprise many tasks and end-to-end relationships”) argues that, because diverse tasks encompass many end-to-end relationships, the idea that a broad set of tasks can be trained “end to end” is mistaken, a result of the narrowness of current trained systems in which services form chains rather than networks that are more wide than deep. We should instead expect that broad capabilities will best be implemented by sets of systems (or sets of end-to-end chains of systems) that comprise well-focused competencies: Systems that draw on distinct subtask competencies will typically be easier to train and provide more robust and general performance (Section 17.5). Modularity typically improves flexibility and generality, rather than impeding it.
Note that the ability to employ subtask components in multiple contexts constitutes a form of transfer learning, and [...] this transfer learning can carry with it task-specific aspects of behavioral alignment.
This seems like the main crux of the disagreement. My claim is that for any particular task, given enough compute, data and model size, an opaque, well-integrated, unstructured AI system will outperform a transparent, modular collection of services. This is only on the axis of performance at the task: I agree that the structured system will generalize better out of distribution (which leads to robustness, flexibility, and better transfer learning). I’m basing this primarily off of empirical evidence and intuitions:
For many tasks so far (computer vision, NLP, robotics), transitioning from a modular architecture to end-to-end deep learning led to large boosts in performance.
My impression is that many interdisciplinary academics are able to transfer ideas and intuitions from one of their fields to the other, allowing them to make big contributions that more experienced researchers could not do. This suggests that patterns of problem-solving from one field can transfer to another in a non-trivial way, that you could achieve best with well-integrated systems.
Psychology research can be thought of as an attempt to systematize/modularize our knowledge about humans. Despite a huge amount of work in psychology, our internal, implicit, well-integrated models of humans are way better than our explicit theories.
Humans definitely solve large tasks in a very structured way; I hypothesize that this is because for those tasks the limits of human compute/data/brain size prevent us from getting the benefits of an unstructured heuristic approach.
Speaking of integration:
Regarding integration, I’ve argued that classic AGI-agent models neither simplify nor explain general AI capabilities (Section 13.3), including the integration of competencies. Whatever integration of functionality one expects to find inside an opaque AGI agent must be based on mechanisms that presumably apply equally well to integrating relatively transparent systems of services. These mechanisms can be dynamic, rather than static, and can include communication via opaque vector embeddings, jointly fine-tuning systems that perform often-repeated tasks, and matching of tasks to services, (including service-development services) in semantically meaningful “task spaces” (discussed in Section 39 “Tiling task-space with AI services can provide general AI capabilities”).
[...]
Direct lateral links between competencies such as organic synthesis, celestial mechanics, ancient Greek, particle physics, image interpretation, algorithm design, traffic planning (etc.) are likely to be sparse, particularly when services perform object-level tasks. This sparseness is, I think, inherent in natural task-structures, quite independent of human cognitive limitations.
(The paragraphs above were written in a response to me while I was still using the phrase “AGI agents”)
I expect that the more you integrate the systems of services, the more opaque they will become. The resulting system will be less interpretable; it will be harder to reason about what information particular services do not have access to (Section 9.4); and it is harder to tell when malicious behavior is happening. The safety affordances identified in CAIS no longer apply because there is not enough modularity between services.
Re: sparseness inherent in task-structures, I think this is a result of human cognitive limitations but don’t know how to argue more for that perspective.
That was the summary :P The full thing was quite a bit longer. I also didn’t want to misquote Eric.
Maybe the shorter summary is: there are two axes which we can talk about. First, will systems be transparent, modular and structured (call this CAIS-like), or will they be opaque and well-integrated? Second, assuming that they are opaque and well-integrated, will they have the classic long-term goal-directed AGI-agent risks or not?
Eric and I disagree on the first one: my position is that for any particular task, while CAIS-like systems will be developed first, they will gradually be replaced by well-integrated ones, once we have enough compute, data, and model capacity.
I’m not sure how much Eric and I disagree on the second one: I think it’s reasonable to predict that the resulting systems are specialized for particular bounded tasks and so won’t be running broad searches for long-term plans. I would still worry about inner optimizers; I don’t know what Eric thinks about that worry.
This summary is more focused on my beliefs than Eric’s, and is probably not a good summary of the intent behind the original comment, which was “what does Eric think Rohin got wrong in his summary + opinion of CAIS”, along with some commentary from me trying to clarify my beliefs.
Updates were mainly about actually carving up the space in the way above. Probably others, but I often find it hard to introspect on how my beliefs are updating.
I don’t understand why this crux needs to be dichotomous. Setting aside the opacity question for the moment, why can’t services in a CAIS be differentiable w.r.t. each other?
Example Consider a language modeling service (L) that is consumed by several downstream tasks, including various text classifiers, an auto-correction service for keyboards, and a machine translation service. In the end-to-end view, it would be wise for these downstream services to use a language representation from L and to propagate their own error information back to L so that it can improve its shared representation. Since the downstream services ultimately make up L’s raison d’etre, it will be obliged to do so.
For situations that are not so neatly differentiable, we can describe the services network as a stochastic computation graph if there is a benefit for end-to-end learning the entire system. This should lead to a slightly more precise conjecture about the relationship between the CAIS agent and utility-maximizing agent: A CAIS agent that can be described as a stochastic computation graph is equivalent to some utility-maximizing agent when trained end-to-end via approximate backpropagation.
It’s likely that CAIS agents aren’t usefully described as stochastic computation graphs, or that we may need to extend the usage of “stochastic computation graph” here to deal with services that create other services as offspring and attach them to the graph. But the possibility itself suggests a spectrum between the archetypal modular CAIS and an end-to-end CAIS, in which subgraphs of the services network are trained end-to-end. It’s not obvious to me that the CAIS as defined in the text discounts this scenario, despite Eric’s comments here.
I broadly agree, especially if you set aside opacity; I very rarely mean to imply a strict dichotomy.
I do think in the scenario you outlined the main issue would be opacity: the learned language representation would become more and more specialized between the various services, becoming less interpretable to humans and more “integrated” across services.
One way to test the “tasks don’t overlap” idea is to have two nets do two different tasks, but connect their internal layers. Then see how high the weights on those layers get. Like, is the internal processing done by Mario AI useful for Greek translation at all? If it is then backprop etc should discover that.
Eric and I have exchanged a few emails since I posted this summary, I’m posting some of it here (with his permission), edited by me for conciseness and clarity. The paragraphs in the quotes are Eric’s, but I have rearranged his paragraphs and omitted some of them for better flow in this comment.
I agree that AGI agents hide mechanism in an opaque box. I also agree that the sort of optimization that current ML does, which is very task-focused, is a strong constraint on behavior. There seems to be a different sort of optimization that humans are capable of, where we can enter a new domain and perform well in it very quickly; I don’t have a good understanding of that sort of optimization, and I think that’s what the classic AGI agent risks are about.
Relatedly, I’ve used the words “monolithic AGI agent” a bunch in the summary and the post. Now, I want to instead talk about whether AI systems will be opaque and well-integrated, since that’s the main crux of our disagreement. It’s plausible to me that even if they are opaque and well-integrated, you don’t get the classic AGI agent risks, because you don’t get the sort of optimization I was talking about above.
This seems like the main crux of the disagreement. My claim is that for any particular task, given enough compute, data and model size, an opaque, well-integrated, unstructured AI system will outperform a transparent, modular collection of services. This is only on the axis of performance at the task: I agree that the structured system will generalize better out of distribution (which leads to robustness, flexibility, and better transfer learning). I’m basing this primarily off of empirical evidence and intuitions:
For many tasks so far (computer vision, NLP, robotics), transitioning from a modular architecture to end-to-end deep learning led to large boosts in performance.
My impression is that many interdisciplinary academics are able to transfer ideas and intuitions from one of their fields to the other, allowing them to make big contributions that more experienced researchers could not do. This suggests that patterns of problem-solving from one field can transfer to another in a non-trivial way, that you could achieve best with well-integrated systems.
Psychology research can be thought of as an attempt to systematize/modularize our knowledge about humans. Despite a huge amount of work in psychology, our internal, implicit, well-integrated models of humans are way better than our explicit theories.
Humans definitely solve large tasks in a very structured way; I hypothesize that this is because for those tasks the limits of human compute/data/brain size prevent us from getting the benefits of an unstructured heuristic approach.
Speaking of integration:
(The paragraphs above were written in a response to me while I was still using the phrase “AGI agents”)
I expect that the more you integrate the systems of services, the more opaque they will become. The resulting system will be less interpretable; it will be harder to reason about what information particular services do not have access to (Section 9.4); and it is harder to tell when malicious behavior is happening. The safety affordances identified in CAIS no longer apply because there is not enough modularity between services.
Re: sparseness inherent in task-structures, I think this is a result of human cognitive limitations but don’t know how to argue more for that perspective.
Can you summarize this exchange, especially what updates you made as a result of it, if any?
That was the summary :P The full thing was quite a bit longer. I also didn’t want to misquote Eric.
Maybe the shorter summary is: there are two axes which we can talk about. First, will systems be transparent, modular and structured (call this CAIS-like), or will they be opaque and well-integrated? Second, assuming that they are opaque and well-integrated, will they have the classic long-term goal-directed AGI-agent risks or not?
Eric and I disagree on the first one: my position is that for any particular task, while CAIS-like systems will be developed first, they will gradually be replaced by well-integrated ones, once we have enough compute, data, and model capacity.
I’m not sure how much Eric and I disagree on the second one: I think it’s reasonable to predict that the resulting systems are specialized for particular bounded tasks and so won’t be running broad searches for long-term plans. I would still worry about inner optimizers; I don’t know what Eric thinks about that worry.
This summary is more focused on my beliefs than Eric’s, and is probably not a good summary of the intent behind the original comment, which was “what does Eric think Rohin got wrong in his summary + opinion of CAIS”, along with some commentary from me trying to clarify my beliefs.
Updates were mainly about actually carving up the space in the way above. Probably others, but I often find it hard to introspect on how my beliefs are updating.
I don’t understand why this crux needs to be dichotomous. Setting aside the opacity question for the moment, why can’t services in a CAIS be differentiable w.r.t. each other?
Example Consider a language modeling service (L) that is consumed by several downstream tasks, including various text classifiers, an auto-correction service for keyboards, and a machine translation service. In the end-to-end view, it would be wise for these downstream services to use a language representation from L and to propagate their own error information back to L so that it can improve its shared representation. Since the downstream services ultimately make up L’s raison d’etre, it will be obliged to do so.
For situations that are not so neatly differentiable, we can describe the services network as a stochastic computation graph if there is a benefit for end-to-end learning the entire system. This should lead to a slightly more precise conjecture about the relationship between the CAIS agent and utility-maximizing agent: A CAIS agent that can be described as a stochastic computation graph is equivalent to some utility-maximizing agent when trained end-to-end via approximate backpropagation.
It’s likely that CAIS agents aren’t usefully described as stochastic computation graphs, or that we may need to extend the usage of “stochastic computation graph” here to deal with services that create other services as offspring and attach them to the graph. But the possibility itself suggests a spectrum between the archetypal modular CAIS and an end-to-end CAIS, in which subgraphs of the services network are trained end-to-end. It’s not obvious to me that the CAIS as defined in the text discounts this scenario, despite Eric’s comments here.
I broadly agree, especially if you set aside opacity; I very rarely mean to imply a strict dichotomy.
I do think in the scenario you outlined the main issue would be opacity: the learned language representation would become more and more specialized between the various services, becoming less interpretable to humans and more “integrated” across services.
One way to test the “tasks don’t overlap” idea is to have two nets do two different tasks, but connect their internal layers. Then see how high the weights on those layers get. Like, is the internal processing done by Mario AI useful for Greek translation at all? If it is then backprop etc should discover that.