In this post, the author proposes a semiformal definition of the concept of “optimization”. This is potentially valuable since “optimization” is a word often used in discussions about AI risk, and much confusion can follow from sloppy use of the term or from different people understanding it differently. While the definition given here is a useful perspective, I have some reservations about the claims made about its relevance and applications.
The key paragraph, which summarizes the definition itself, is the following:
An optimizing system is a system that has a tendency to evolve towards one of a set of configurations that we will call the target configuration set, when started from any configuration within a larger set of configurations, which we call the basin of attraction, and continues to exhibit this tendency with respect to the same target configuration set despite perturbations.
In fact, “continues to exhibit this tendency with respect to the same target configuration set despite perturbations” is redundant: clearly as long as the perturbation doesn’t push the system out of the basin, the tendency must continue.
This is what is known as “attractor” in dynamical systems theory. For comparison, here is the definition of “attractor” from the Wikipedia:
In the mathematical field of dynamical systems, an attractor is a set of states toward which a system tends to evolve, for a wide variety of starting conditions of the system. System values that get close enough to the attractor values remain close even if slightly disturbed.
The author acknowledges this connection, although he also makes the following remark:
We have discussed systems that evolve towards target configurations along some dimensions but not others (e.g. ball in a valley). We have not yet discovered whether dynamical systems theory explicitly studies attractors that operate along a subset of the system’s dimensions.
I find this remark confusing. An attractor that operates along a subset of the dimension is just an attractor submanifold. This is completely standard in dynamical systems theory.
Given that the definition itself is not especially novel, the post’s main claim to value is via the applications. Unfortunately, some of the proposed applications seem to me poorly justified. Specifically, I want to talk about two major examples: the claimed relationship to embedded agency and the claimed relations to comprehensive AI services.
In both cases, the main shortcoming of the definition is that there is an essential property of AI that this definition doesn’t capture at all. The author does acknowledge that “goal-directed agent system” is a distinct concept from “optimizing systems”. However, he doesn’t explain how are they distinct.
One way to formulate the difference is as follows: agency = optimization + learning. An agent is not just capable of steering a particular universe towards a certain outcome, it is capable of steering an entire class of universes, without knowing in advance in which universe it was placed. This underlies all of RL theory, this is implicit in the Shane-Legg definition of intelligence and my own[1], this is what Yudkowsky calls “cross domain”.
The issue of learning is not just nitpicking, it is crucial to delineate the boundary around “AI risk”, and delineating the boundary is crucial to constructively think of solutions. If we ignore learning and just talk about “optimization risks” then we will have to include the risk of pandemics (because bacteria are optimizing for infection), the risk of false vacuum collapse in particle accelerators (because vacuum bubbles are optimizing for expanding), the risk of runaway global warming (because it is optimizing for increasing temperature) et cetera. But, these are very different risks that require very different solutions.
There is another, less central, difference: the author requires a particular set of “target states” whereas in the context of agency it is more natural to consider utility functions, which means there is a continuous gradation of states rather than just “good states” and “bad states”. This is related to the difference the author points out between his definition and Yudkowsky’s:
When discerning the boundary between optimization and non-optimization, we look principally at robustness — whether the system will continue to evolve towards its target configuration set in the face of perturbations — whereas Yudkowsky looks at the improbability of the final configuration.
The improbability of the final configuration is a continuous metric, whereas just arriving or not arriving at a particular set is discrete.
Let’s see how this shortcoming affects the conclusions. About embedded agency, the author writes:
One could view the Embedded Agency work as enumerating the many logical pitfalls one falls into if one takes the “optimizer” concept as the starting point for designing intelligent systems, rather than “optimizing system” as we propose here.
The correct starting point is “agent”, defined in the way I gestured at above. If instead we start with “optimizing system” then we throw away the baby with the bathwater, since the crucial aspect of learning is ignored. This is an essential property of the embedded agency problem: arguably the entire difficulty is about how can we define learning without introducing unphysical dualism (indeed, I have recently addressed this problem, and “optimizing system” doesn’t seem very helpful there).
About comprehensive AI services:
Our perspective is that there is a specific class of intelligent systems — which we call optimizing systems — that are worthy of special attention and study due to their potential to reshape the world. The set of optimizing systems is smaller than the set of all AI services, but larger than the set of goal-directed agentic systems.
What is an example of an optimizing AI system that is not agentic? The author doesn’t give such an example and instead talks about trees, which are not AIs. I agree that the class of dangerous systems is substantially wider than the class of systems which were explicitly designed with agency in mind. However, this is precisely because agency can arise from such systems even when not explicitly designed, and moreover this is hard to avoid if the system is to be powerful enough for pivotal acts. This is not because there is some class of “optimizing AI systems” which are intermediate between “agentic” and “non-agentic”.
To summarize, I agree with and encourage the use of tools from dynamical systems theory to study AI. However, one must acknowledge to correct scope of these tools and what they don’t do. Moreover, more work is needed before truly novel conclusions can be obtained by these means.
In this post, the author proposes a semiformal definition of the concept of “optimization”. This is potentially valuable since “optimization” is a word often used in discussions about AI risk, and much confusion can follow from sloppy use of the term or from different people understanding it differently. While the definition given here is a useful perspective, I have some reservations about the claims made about its relevance and applications.
The key paragraph, which summarizes the definition itself, is the following:
In fact, “continues to exhibit this tendency with respect to the same target configuration set despite perturbations” is redundant: clearly as long as the perturbation doesn’t push the system out of the basin, the tendency must continue.
This is what is known as “attractor” in dynamical systems theory. For comparison, here is the definition of “attractor” from the Wikipedia:
The author acknowledges this connection, although he also makes the following remark:
I find this remark confusing. An attractor that operates along a subset of the dimension is just an attractor submanifold. This is completely standard in dynamical systems theory.
Given that the definition itself is not especially novel, the post’s main claim to value is via the applications. Unfortunately, some of the proposed applications seem to me poorly justified. Specifically, I want to talk about two major examples: the claimed relationship to embedded agency and the claimed relations to comprehensive AI services.
In both cases, the main shortcoming of the definition is that there is an essential property of AI that this definition doesn’t capture at all. The author does acknowledge that “goal-directed agent system” is a distinct concept from “optimizing systems”. However, he doesn’t explain how are they distinct.
One way to formulate the difference is as follows: agency = optimization + learning. An agent is not just capable of steering a particular universe towards a certain outcome, it is capable of steering an entire class of universes, without knowing in advance in which universe it was placed. This underlies all of RL theory, this is implicit in the Shane-Legg definition of intelligence and my own[1], this is what Yudkowsky calls “cross domain”.
The issue of learning is not just nitpicking, it is crucial to delineate the boundary around “AI risk”, and delineating the boundary is crucial to constructively think of solutions. If we ignore learning and just talk about “optimization risks” then we will have to include the risk of pandemics (because bacteria are optimizing for infection), the risk of false vacuum collapse in particle accelerators (because vacuum bubbles are optimizing for expanding), the risk of runaway global warming (because it is optimizing for increasing temperature) et cetera. But, these are very different risks that require very different solutions.
There is another, less central, difference: the author requires a particular set of “target states” whereas in the context of agency it is more natural to consider utility functions, which means there is a continuous gradation of states rather than just “good states” and “bad states”. This is related to the difference the author points out between his definition and Yudkowsky’s:
The improbability of the final configuration is a continuous metric, whereas just arriving or not arriving at a particular set is discrete.
Let’s see how this shortcoming affects the conclusions. About embedded agency, the author writes:
The correct starting point is “agent”, defined in the way I gestured at above. If instead we start with “optimizing system” then we throw away the baby with the bathwater, since the crucial aspect of learning is ignored. This is an essential property of the embedded agency problem: arguably the entire difficulty is about how can we define learning without introducing unphysical dualism (indeed, I have recently addressed this problem, and “optimizing system” doesn’t seem very helpful there).
About comprehensive AI services:
What is an example of an optimizing AI system that is not agentic? The author doesn’t give such an example and instead talks about trees, which are not AIs. I agree that the class of dangerous systems is substantially wider than the class of systems which were explicitly designed with agency in mind. However, this is precisely because agency can arise from such systems even when not explicitly designed, and moreover this is hard to avoid if the system is to be powerful enough for pivotal acts. This is not because there is some class of “optimizing AI systems” which are intermediate between “agentic” and “non-agentic”.
To summarize, I agree with and encourage the use of tools from dynamical systems theory to study AI. However, one must acknowledge to correct scope of these tools and what they don’t do. Moreover, more work is needed before truly novel conclusions can be obtained by these means.
Modulo issues with traps which I will not go into atm.