The ground of optimization
This work was supported by OAK, a monastic community in the Berkeley hills. This document could not have been written without the daily love of living in this beautiful community. The work involved in writing this cannot be separated from the sitting, chanting, cooking, cleaning, crying, correcting, fundraising, listening, laughing, and teaching of the whole community.
What is optimization? What is the relationship between a computational optimization process — say, a computer program solving an optimization problem — and a physical optimization process — say, a team of humans building a house?
We propose the concept of an optimizing system as a physically closed system containing both that which is being optimized and that which is doing the optimizing, and defined by a tendency to evolve from a broad basin of attraction towards a small set of target configurations despite perturbations to the system. We compare our definition to that proposed by Yudkowsky, and place our work in the context of work by Demski and Garrabrant’s Embedded Agency, and Drexler’s Comprehensive AI Services. We show that our definition resolves difficult cases proposed by Daniel Filan. We work through numerous examples of biological, computational, and simple physical systems showing how our definition relates to each.
Introduction
In the field of computer science, an optimization algorithm is a computer program that outputs the solution, or an approximation thereof, to an optimization problem. An optimization problem consists of an objective function to be maximized or minimized, and a feasible region within which to search for a solution. For example we might take the objective function as a minimization problem and the whole real number line as the feasible region. The solution then would be and a working optimization algorithm for this problem is one that outputs a close approximation to this value.
In the field of operations research and engineering more broadly, optimization involves improving some process or physical artifact so that it is fit for a certain purpose or fulfills some set of requirements. For example, we might choose to measure a nail factory by the rate at which it outputs nails, relative to the cost of production inputs. We can view this as a kind of objective function, with the factory as the object of optimization just as the variable x was the object of optimization in the previous example.
There is clearly a connection between optimizing the factory and optimizing for x, but what exactly is this connection? What is it that identifies an algorithm as an optimization algorithm? What is it that identifies a process as an optimization process?
The answer proposed in this essay is: an optimizing system is a physical process in which the configuration of some part of the universe moves predictably towards a small set of target configurations from any point in a broad basin of optimization, despite perturbations during the optimization process.
We do not imagine that there is some engine or agent or mind performing optimization, separately from that which is being optimized. We consider the whole system jointly — engine and object of optimization — and ask whether it exhibits a tendency to evolve towards a predictable target configuration. If so, then we call it an optimizing system. If the basin of attraction is deep and wide then we say that this is a robust optimizing system.
An optimizing system as defined in this essay is known in dynamical systems theory as a dynamical system with one or more attractors. In this essay we show how this framework can help to understand optimization as manifested in physically closed systems containing both engine and object of optimization.
In this way we find that optimizing systems are not something that are designed but are discovered. The configuration space of the world contains countless pockets shaped like small and large basins, such that if the world should crest the rim of one of these pockets then it will naturally evolve towards the bottom of the basin. We care about them because we can use our own agency to tip the world into such a basin and then let go, knowing that from here on things will evolve towards the target region.
All optimization basins have a finite extent. A ball may roll to the center of a valley if initially placed anywhere within the valley, but if it is placed outside the valley then it will roll somewhere else entirely, or perhaps will not roll at all. Similarly, even a very robust optimizing system has an outer rim to its basin of attraction, such that if the configuration of the system is perturbed beyond that rim then the system no longer evolves towards the target that it once did. When an optimizing system deviates beyond its own rim, we say that it dies. An existential catastrophe is when the optimizing system of life on Earth moves beyond its own outer rim.
Example: computing the square root of two
Say I ask my computer to compute the square root of two, for example by opening a python interpreter and typing:
>>> print(math.sqrt(2))
1.41421356237
The value printed here is actually calculated by solving an optimization problem. It works roughly as follows. First we set up an objective function that has as its minimum value the square root of two. One function we could use is
Next we pick an initial estimate for the square root of two, which can be any number whatsoever. Let’s take 1.0 as our initial guess. Then we take a gradient step in the direction indicated by computing the slope of the objective function at our initial estimate:
Then we repeat this process of computing the slope and updating our estimate over and over, and our optimization algorithm quickly converges to the square root of two:
This is gradient descent, and it can be implemented in a few lines of python code:
current_estimate = 1.0
step_size = 1e-3
while True:
objective = (current_estimate**2 - 2) ** 2
gradient = 4 * current_estimate * (current_estimate**2 - 2)
if abs(gradient) < 1e-8:
break
current_estimate -= gradient * step_size
But this program has the following unusual property: we can modify the variable that holds the current estimate of the square root of two at any point while the program is running, and the algorithm will still converge to the square root of two. That is, while the code above is running, if I drop in with a debugger and overwrite the current estimate while the loop is still executing, what will happen is that the next gradient step will start correcting for this perturbation, pushing the estimate back towards the square root of two:
If we give the algorithm time to converge to within machine precision of the actual square root of two then the final output will be bit-for-bit identical to the result we would have gotten without the perturbation.
Consider this for a moment. For most kinds of computer code, overwriting a variable while the code is running will either have no effect because the variable isn’t used, or it will have a catastrophic effect and the code will crash, or it will simply cause the code to output the wrong answer. If I use a debugger to drop in on a webserver servicing an http request and I overwrite some variable with an arbitrary value just as the code is performing a loop in which this variable is used in a central way, bad things are likely to happen! Most computer code is not robust to arbitrary in-flight data modifications.
But this code that computes the square root of two is robust to in-flight data modifications, or at least the “current estimate” variable is. It’s not that our perturbation has no effect: if we change the value, the next iteration of the algorithm will compute the objective function and its slope at a completely different point, and each iteration after that will be different to how it would have been if we hadn’t intervened. The perturbation may change the total number of iterations before convergence is reached. But ultimately the algorithm will still output an estimate of the square root of two, and, given time to fully converge, it will output the exact same answer it would have output without the perturbation. This is an unusual breed of computer program indeed!
What is happening here is that we have constructed a physical system consisting of a computer and a python program that computes the square root of two, such that:
-
for a set of starting configurations (in this case the set of configurations in which the “current estimate” variable is set to each representable floating point number),
-
the system exhibits a tendency to evolve towards a small set of target configurations (in this case just the single configuration in which the “current estimate” variable is set to the square root of two),
-
and this tendency is robust to in-flight perturbations to the system’s configuration (in this case robustness is limited to just the dimensions corresponding to changes in the “current estimate” variable).
In this essay I argue that systems that converge to some target configuration, and will do so despite perturbations to the system, are the systems we should rightly call “optimizing systems”.
Example: building a house
Consider a group of humans building a house. Let us consider the humans together with the building materials and construction site as a single physical system. Let us imagine that we assemble this system inside a completely closed chamber, including food and sleeping quarters for the humans, lighting, a power source, construction materials, construction blueprint, as well as the physical humans with appropriate instructions and incentives to build the house. If we just put these physical elements together we get a system that has a tendency to evolve under the natural laws of physics towards a configuration in which there is a house matching the blueprint.
We could perturb the system while the house is being built — say by dropping in at night and removing some walls or moving some construction materials about — and this physical system will recover. The team of humans will come in the next day and find the construction materials that were moved, put in new walls to replace the ones that were removed, and so on.
Just like the square root of two example, here is a physical system with:
-
A basin of attraction (all the possible arrangements of viable humans and building materials)
-
A target configuration set that is small relative to the basin of attraction (those in which the building materials have been arranged into a house matching the design)
-
A tendency to evolve towards the target configurations when starting from any point within the basin of attraction, despite in-flight perturbations to the system
Now this system is not infinitely robust. If we really scramble the arrangement of atoms within this system then we’ll quickly wind up with a configuration that does not contain any humans, or in which the building materials are irrevocably destroyed, and then we will have a system without the tendency to evolve towards any small set of final configurations.
In the physical world we are not surprised to find systems that have this tendency to evolve towards a small set of target configurations. If I pick up my dog while he is sleeping and move him by a few inches, he still finds his way to his water bowl when he wakes up. If I pull a piece of bark off a tree, the tree continues to grow in the same upward direction. If I make a noise that surprises a friend working on some math homework, the math homework still gets done. Systems that contain living beings regularly exhibit this tendency to evolve towards target configurations, and tend to do so in a way that is robust to in-flight perturbations. As a result we are familiar with physical systems that have this property, and we are not surprised when they arise in our lives.
But physical systems in general do not have the tendency to evolve towards target configurations. If I move a billiard ball a few inches to the left while a bunch of billiard balls are energetically bouncing around a billiard table, the balls are likely to come to rest in a very different position than if I had not moved the ball. If I change the trajectory of a satellite a little bit, the satellite does not have any tendency to move back into its old orbit.
The computer systems that we have built are still, by and large, more primitive than the living systems that we inhabit, and most computer systems do not have the tendency to evolve robustly towards some set of target configurations, so optimization algorithms as discussed in the previous section, which do have this property, are somewhat unusual.
Defining optimization
An optimizing system is a system that has a tendency to evolve towards one of a set of configurations that we will call the target configuration set, when started from any configuration within a larger set of configurations, which we call the basin of attraction, and continues to exhibit this tendency with respect to the same target configuration set despite perturbations.
Some systems may have a single target configuration towards which they inevitably evolve. Examples are a ball in a steep valley with a single local minimum, and a computer computing the square root of two. Other systems may have a set of target configurations and perturbing the system may cause it to evolve towards a different member of this set. Examples are a ball in a valley with multiple local minima, or a tree growing upwards (perturbing the tree by, for example, cutting off some branches while it is growing will probably change its final shape, but will not change its tendency to grow towards one of the configurations in which it has reached its maximum size).
We can quantify optimizing systems in the following ways.
Robustness. Along how many dimensions can we perturb the system without altering its tendency to evolve towards the target configuration set? What magnitude perturbation can the system absorb along these dimensions? A self-driving car navigating through a city may be robust to perturbations that involve physically moving the car to a different position on the road in the city, but not to perturbations that involve changing the state of physical memory registers that contain critical bits of computer code in the car’s internal computer.
Duality. To what extent can we identify subsets of the system corresponding to “that which is being optimized” and “that which is doing the optimization”? Between engine and object of optimization; between agent and world. Highly dualistic systems may be robust to perturbations of the object of optimization, but brittle with respect to perturbations of the engine of optimization. For example, a system containing a 2020s-era robot moving a vase around is a dualistic optimizing system: there is a clear subset of the system that is the engine of optimization (the robot), and object of optimization (the vase). Furthermore, the robot may be able to deal with a wide variety of perturbations to the environment and to the vase, but there are likely to be numerous small perturbations to the robot itself that will render it inert. In contrast, a tree is a non-dualistic optimizing system: the tree does grow towards a set of target configurations, but it makes no sense to ask which part of the tree is “doing” the optimization and which part is “being” optimized. This latter example is discussed further below.
Retargetability. Is it possible, using only a microscopic perturbation to the system, to change the system such that it is still an optimizing system but with a different target configuration set? A system containing a robot with the goal of moving a vase to a certain location can be modified by making just a small number of microscopic perturbations to key memory registers such that the robot holds the goal of moving the vase to a different location and the whole vase/robot system now exhibits a tendency to evolve towards a different target configuration. In contrast, a system containing a ball rolling towards the bottom of a valley cannot generally be modified by any microscopic perturbation such that the ball will roll to a different target location. A tree is an intermediate example: to cause the tree to evolve towards a different target configuration set — say, one in which its leaves were of a different shape — one would have to modify the genetic code simultaneously in all of the tree’s cells.
Relationship to Yudkowsky’s definition of optimization
In Measuring Optimization Power, Eliezer Yudkowsky defines optimization as a process in which some part of the world ends up in a configuration that is high in an agent’s preference ordering, yet has low probability of arising spontaneously. Yudkowsky’s definition asks us to look at a patch of the world that has already undergone optimization by an agent or mind, and draw conclusions about the power or intelligence of that mind by asking how unlikely it would be for a configuration of equal or greater utility (to the agent) to arise spontaneously.
Our definition differs from this in the following ways:
-
We look at whole systems that evolve naturally under physical laws. We do not assume that we can decompose these systems into some engine and object of optimization, or into mind and environment. We do not look at systems that are “being optimized” by some external entity but rather at “optimizing systems” that exhibit a natural tendency to evolve towards a target configuration set. These optimizing systems may contain subsystems that have the properties of agents, but as we will see there are many instances of optimizing systems that do not contain dualistic agentic subsystems.
-
When discerning the boundary between optimization and non-optimization, we look principally at robustness — whether the system will continue to evolve towards its target configuration set in the face of perturbations — whereas Yudkowsky looks at the improbability of the final configuration.
Relationship to Drexler’s Comprehensive AI Services
Eric Drexler has written about the need to consider AI systems that are not goal-directed agents. He points out that the most economically important AI systems today are not constructed within the agent paradigm, and that in fact agents represent just a tiny fraction of the design space of intelligent systems. For example, a system that identifies faces in images would be an intelligent system but not an agent according to Drexler’s taxonomy. This perspective is highly relevant to our discussion here since we seek to go beyond the narrow agent model in which intelligent systems are conceived of as unitary entities that receive observations from the environment, send actions back into the environment, but are otherwise separate from the environment.
Our perspective is that there is a specific class of intelligent systems — which we call optimizing systems — that are worthy of special attention and study due to their potential to reshape the world. The set of optimizing systems is smaller than the set of all AI services, but larger than the set of goal-directed agentic systems.
Figure: relationship between our optimizing system concept and Drexler’s taxonomy of AI systems
Examples of systems that lie in each of these three tiers are as follows:
-
A system that identifies faces in images by evaluating a feed-forward neural network is an AI system but not an optimizing system.
-
A tree is an optimizing system but not a goal-directed agent system (see section below analyzing a tree as an optimizing system).
-
A robot with the goal of moving a ball to a specific destination is a goal-directed agent system.
Relationship to Garrabrant and Demski’s Embedded Agency
Scott Garrabrant and Abram Demski have written about the many ways that a dualistic view of agency in which one conceives of a hard separation between agent and environment fails to capture the reality of agents that are reducible to the same basic building-blocks as the environments in which they are embedded. They show that if one starts from a dualistic view of agency then it is difficult to design agents capable of reflecting on and making improvements to their own cognitive processes, since the dualistic view of agency rests on a unitary agent whose cognition does not affect the world except via explicit actions. They also show that reasoning about counterfactuals becomes nonsensical if starting from a dualistic view of agency, since the agent’s cognitive processes are governed by the same physical laws as those that govern the environment, and the agent can come to notice this fact, leading to confusion when considering the consequences actions that are different from the actions that the agent will, in fact, output.
One could view the Embedded Agency work as enumerating the many logical pitfalls one falls into if one takes the “optimizer” concept as the starting point for designing intelligent systems, rather than “optimizing system” as we propose here. The present work is strongly inspired by Garrabrant and Demski’s work. Our hope is to point the way to a view of optimization and agency that captures reality sufficiently well to avoid the logical pitfalls identified in the Embedded Agency work.
Example: ball in a valley
Consider a physical ball rolling around in a small valley. According to our definition of optimization, this is an optimizing system:
Configuration space. The system we are studying consists of the physical valley plus the ball
Basin of attraction. The ball could initially be placed anywhere in the valley (these are the configurations comprising the basin of attraction)
Target configuration set. The ball will roll until it ends up at the bottom of the valley (the set of local minima are the target configurations)
We can perturb the ball while it is “in flight”, say by changing its position or velocity, and the ball will still ultimately end up at one of the target configurations. This system is robust to perturbations along dimensions corresponding to the spatial position and velocity of the ball, but there are many more dimensions along which this system is not robust. If we change the shape of the ball to a cube, for example, then the ball will not continue rolling to the bottom of the valley.
Example: ball in valley with robot
Consider now a ball in a valley as above, but this time with the addition of an intelligent robot holding the goal of ensuring that the ball reaches the bottom of the valley.
Configuration space. The system we are studying now consists of the physical valley, the ball, and the robot. We consider the evolution of and perturbations to this whole joint system.
Target configuration set. As before, the target configuration is the ball being at the bottom of the valley
Basin of attraction. As before, the basin of attraction consists of all the possible spatial locations that the ball could be placed in the valley.
We can now perturb the system along many more dimensions than in the case where there was no robot. For example, we could introduce a barrier that prevents the ball from rolling downhill past a certain point, and we can then expect a sufficiently intelligent robot to move the ball over the barrier. We can expect a sufficiently well-designed robot to be able to overcome a wide variety of hurdles that gravity would not overcome on its own. Therefore we say that this system is more robust than the system without the robot.
There is a sequence of systems spanning the gap between a ball rolling in a valley, which is robust to a narrow set of perturbations and therefore we say exhibits a weak degree of optimization, up to a robot with a goal of moving a ball around in a valley, which is robust to a much wider set of perturbations, and therefore we say exhibits a stronger degree of optimization. Therefore the difference between systems that do and do not undergo optimization is not a binary distinction but a continuous gradient of increasing robustness to perturbations.
By introducing the robot to the system we have also introduced new dimensions along which the system is fragile: the dimensions corresponding to modifications to the robot itself, and in particular the dimensions corresponding to modifications to the code running on the robot (i.e. physical perturbations to the configuration of the memory cells in which the code is stored). There are two types of perturbation we might consider:
-
Perturbations that destroy the robot. There are numerous ways we could cut wires or scramble computer code that would leave the robot completely non-operational. Many of these would be physically microscopic, such as flipping a single bit in a memory cell containing some critical computer code. In fact there are now more ways to break the system via microscopic perturbations compared to when we were considering a ball in a valley without a robot, since there are few ways to cause a ball not to reach the bottom of a valley by making only a microscopic perturbation to the system, but there are many ways to break modern computer systems via a microscopic perturbation.
-
Perturbations that change the target configurations. We could also make physically microscopic perturbations to this system that change the robot’s goal. For example we might flip the sign on some critical computations in the robot’s code such that the robot works to place the ball at the highest point rather than the lowest. This is still a physical perturbation to the valley/ball/robot system: it is one that affects the configuration of the memory cells containing the robot’s computer code. These kinds of perturbations may point to a concept with some similarity to that of an agent. If we have a system that can be perturbed in a way that preserves the robustness of the basin of convergence but changes the target configuration towards which the system tends to evolve, and if we can find perturbations that cause the target configurations to match our own goals, then we have a way to navigate between convergence basins.
Example: computer performing gradient descent
Consider now a computer running an iterative gradient descent algorithm in order to solve an optimization problem. For concreteness let us imagine that the objective function being optimized is globally convex, in which case the algorithm will certainly reach the global optimum given sufficient time. Let us further imagine that the computer stores its current best estimate of the location of the global optimum (which we will henceforth call the “optimizand”) at some known memory location, and updates this after every iteration of gradient descent.
Since this is a purely computational process, it may be tempting to define the configuration space at the computational level — for example by taking the configuration space to be the domain of the objective function. However, it is of utmost importance when analyzing any optimizing system to ground our analysis in a physical system evolving according to the physical laws of nature, just as we have for all previous examples. The reason this is important is to ensure that we always study complete systems, not just some inert part of the system that is “being optimized” by something external to the system. Therefore we analyze this system as follows.
Configuration space. The system consists of a physical computer running some code that performs gradient descent. The configurations of the system are the physical configurations of the atoms comprising the computer.
Target-configuration set. The target configuration set consists of the set of physical configurations of the computer in which the memory cells that store the current optimized state contain the true location of the global optimum (or the closest floating point representation of it).
Basin of attraction. The basin of attraction consists of the set of physical configurations in which there is a viable computer and it is running the gradient descent algorithm.
Example: billiard balls
Let us now examine a system that is not an optimizing system according to our definition. Consider a billiard table with some billiard balls that are currently bouncing around in motion. Left alone, the balls will eventually come to rest in some configuration. Is this an optimizing system?
In order to qualify as an optimizing system, a system must (1) have a tendency to evolve towards a set of target configurations that are small relative to the basin of attraction, and (2) continue to evolve towards the same set of target configurations if perturbed.
If we reach in while the billiard balls are bouncing around and move one of the balls that is in motion, the system will now come to rest in a different configuration. Therefore this is not an optimizing system, because there is no set of target configurations towards which the system evolves despite perturbations. A system does not need to be robust along all dimensions in order to be an optimizing system, but a billiard table exhibits no such robust dimensions at all, so it is not an optimizing system.
Example: satellite in orbit
Consider a second example of a system that is not an optimizing system: a satellite in orbit around Earth. Unlike the billiard balls, there is no chaotic tendency for small perturbations to lead to large deviations in the system’s evolution, but neither is there any tendency for the system to come back to some target configuration when perturbed. If we perturb the satellite’s velocity or position, then from that point on it is in a different orbit and has no tendency to return to its previous orbit. There is no set of target configurations towards which the system evolves despite perturbations, so this is not an optimizing system.
Example: a tree
Consider a patch of fertile ground with a tree growing in it. Is this an optimizing system?
Configuration space. For the sake of concreteness let us take a region of space that is sealed off from the outside world — say 100m x 100m x 100m. This region is filled at the bottom with fertile soil and at the top with an atmosphere conducive to the tree’s growth. Let us say that the region contains a single tree.
We will analyze this system in terms of the arrangement of atoms inside this region of space. Out of all the possible configurations of these atoms, the vast majority consist of a uniform hazy gas. An astronomically tiny fraction of configurations contain a non-trivial mass of complex biological nutrients making up soil. An even tinier fraction of configurations contain a viable tree.
Target-configuration set. A tree has a tendency to grow taller over time, to sprout more branches and leaves, and so on. Furthermore, trees can only grow so tall due to the physics of transporting sugars up and down the trunk. So we can identify a set of target configurations in which the atoms in our region of space are arranged into a tree that has grown to its maximum size (has sprouted as many branches and leaves as it can support given the atmosphere, the soil that it is growing in, and the constraints of its own biology). There are many topologies in which the tree’s branches could divide, many positions that leaves could sprout in, and so on, so there are many configurations within the target configuration set. But this set is still tiny compared to all the ways that the same atoms could be arranged without the constraint of forming a viable tree.
Basin of convergence. This system will evolve towards the target configuration set starting from any configuration in which there is a viable tree. This includes configurations in which there is just a seed in the ground, as well as configurations in which there is a tree of small, medium, or large size. Starting from any of these configurations, if we leave the system to evolve under the natural laws of physics then the tree will grow towards its maximum size, at which point the system will be in one of the target configurations.
Robustness to perturbations. This system is highly robust to perturbations. Consider perturbing the system in any of the following ways:
-
Moving soil from one place to another
-
Removing some leaves from the tree
-
Cutting a branch off the tree
These perturbations might change which particular target configuration is eventually reached — the particular arrangement of branches and leaves in the tree once it reaches its maximum size — but they will not stop the tree from growing taller and evolving towards a target configuration. In fact we could cut the tree right at the base of the trunk and it would continue to evolve towards a target configuration by sprouting a new trunk and growing a whole new tree.
Duality. A tree is a non-dualistic optimizing system. There is no subsystem that is responsible for “doing” the optimization, separately from that which is “being” optimized. Yet the tree does exhibit a tendency to evolve towards a set of target configurations, and can overcome a wide variety of perturbations in order to do so. There are no man-made systems in existence today that are capable of gathering and utilizing resources so flexibly as a tree, from so broad a variety of environments, and there are certainly no man-made systems that can recover from being physically dismembered to such an extent that a tree can recover from being cut at the trunk.
At this point it may be tempting to say that the engine of optimization is natural selection. But recall that we are studying just a single tree growing from seed to maximum size. Can you identify a physical subset of our 100m x 100m x 100m region of space that is this engine of optimization, analogous to how we identified a physical subset of the robot-and-ball system as the engine of optimization (i.e. the physical robot)? Natural selection might be the process by which the initial system came into existence, but it is not the process that drives the growth of the tree towards a target configuration.
It may then be tempting to say that it is the tree’s DNA that is the engine of optimization. It is true that the tree’s DNA exhibits some characteristics of an engine of optimization: it remains unchanged throughout the life of the tree, and physically microscopic perturbations to it can disable the tree. But a tree replicates its DNA in each of its cells, and perturbing just one or a small number of these is not likely to affect the tree’s overall growth trajectory. More importantly, a single strand of DNA does not really have agency on its own: it requires the molecular machinery of the whole cell to synthesize proteins based on the genetic code in the DNA, and the physical machinery of the whole tree to collect and deploy energy, water, and nutrients. Just as it would be incorrect to identify the memory registers containing computer code within a robot as the “true” engine of optimization separate from the rest of the computing and physical machinery that brings this code to life, it is not quite accurate to identify DNA as an engine of optimization. A tree simply does not decompose into engine and object of optimization.
It may also be tempting to ask whether the tree can “really” be said to be undergoing optimization in the absence of any “intention” to reach one of the target configurations. But this expectation of a centralized mind with centralized intentions is really an artifact of us projecting our view of our self onto the world: we believe that we have a centralized mind with centralized intentions, so we focus our attention on optimizing systems with a similar structure. But this turns out to be misguided on two counts: first, the vast majority of optimizing systems do not contain centralized minds, and second, our own minds are actually far less centralized than we think! For now we put this question of whether optimization requires intentions and instead just work within our definition of optimizing systems, which a tree definitely satisfies.
Example: bottle cap
Daniel Filan has pointed out that some definitions of optimization would nonsensically classify a bottle cap as an optimizer, since a bottle cap causes water molecules in a bottle to stay inside the bottle, and the set of configurations in which the molecules are inside a bottle is much smaller than the set of configurations in which the molecules are each allowed to take a position either inside or outside the bottle.
In our framework we have the following:
-
The system consists of a bottle, a bottle cap, and water molecules. The configuration space consists of all the possible spatial arrangements of water molecules, either inside or outside the bottle.
-
The basin of attraction is the set of configurations in which the water molecules are inside the bottle
-
The target configuration set is the same as the basin of attraction
This is not an optimizing system for two reasons.
First, the target configuration set is no smaller than the basin of attraction. To be an optimizing system there must be a tendency to evolve from any configuration within a basin of attraction towards a smaller target configuration set, but in this case the system merely remains within the set of configurations in which the water molecules are inside the bottle. This is no different from a rock sitting on a beach: due to basic chemistry there is a tendency to remain within the set of configurations in which the molecules comprising the rock are physically bound to one another, but it has no tendency to evolve from a wide basin of attraction towards a small set of target configuration.
Second, the bottle cap system is not robust to perturbations since if we perturb the position of a single water molecule so that it is outside the bottle, there is no tendency for it to move back inside the bottle. This is really just the first point above restated, since if there were a tendency for water molecules moved outside the bottle to evolve back towards a configuration in which all the water molecules were inside the bottle, then we would have a basin of attraction larger than the target configuration set.
Example: the human liver
Filan also asks whether one’s liver should be considered an optimizer. Suppose we observe a human working to make money. If this person were deprived of a liver, or if their liver stopped functioning, they would presumably be unable to make money. So are we then to view the liver as an optimizer working towards the goal of making money? Filan asks this question as a challenge to Yudkowsky’s definition of optimization, since it seems absurd to view one’s liver as an optimizer working towards the goal of making money, yet Yudkowsky’s definition of optimization might classify it as such.
In our framework we have the following:
-
The system consists of a human working to make money, together with the whole human economy and world.
-
The basin of attraction consists of the configurations in which there is a healthy human (with a healthy liver) having the goal of making money
-
The target configurations are those in which this person’s bank balance is high. (Interestingly there is no upper bound here, so there is no fixed point but rather a continuous gradient.)
We can expect that this person is capable of overcoming a reasonably broad variety of obstacles in pursuit of making money, so we recognize that this overall system (the human together with the whole economy) is an optimizing system. But Filan would surely agree on this point and his question is more specific: he is asking whether the liver is an optimizer.
In general we cannot expect to decompose optimizing systems into an engine of optimization and object of optimization. We can see that the system has the characteristics of an optimizing system, and we may identify parts, including in this case the person’s liver, that are necessary for these characteristics to exist, but we cannot in general identify any crisp subset of the system as that which is doing the optimization. And picking various subcomponents of the system (such as the person’s liver) and asking “is this the part that is doing the optimization?” does not in general have an answer.
By analogy, suppose we looked at a planet orbiting a star and asked: “which part here is doing the orbiting?” Is it the planet or the star that is the “engine of orbiting”? Or suppose we looked at a car and noticed that the fuel pump is a complex piece of machinery without which the car’s locomotion would cease. We might ask: is this fuel pump the true “engine of locomotion”? These questions don’t have answers because they mistakenly presuppose that we can identify a subsystem that is uniquely responsible for the orbiting of the planet or the locomotion of the car. Asking whether a human liver is an “optimizer” is similarly mistaken: we can see that the liver is a complex piece of machinery that is necessary in order for the overall system to exhibit the characteristics of an optimizing system (robust evolution towards a target configuration set), but beyond this it makes no more sense to ask whether the liver is a true “locus of optimization”.
So rather than answering Filan’s question in either the positive or the negative, the appropriate move is to dissolve the concept of an optimizer, and instead ask whether the overall system is an optimizing system.
Example: the universe as a whole
Consider the whole physical universe as a single closed system. Is this an optimizing system?
The second law of thermodynamics tells us that the universe is evolving towards a maximally disordered thermodynamic equilibrium in which it cycles through various maxentropy configuration. We might then imagine that the universe is an optimizing system in which the basin of attraction is all possible configurations of matter and energy, and the target configuration set consists of the maxentropy configurations.
However, this is not quite accurate. Out of all possible configurations of the universe, the vast majority of configurations are at or close to maximum entropy. That is, if we sample a configuration of the universe at random, we have only an astronomically tiny chance of finding anything other than a close-to-uniform gas of basic particles. If we define the basin of attraction as all possible configurations of matter in the universe and the target configuration set as the set of maxentropy configurations, then the target configuration set actually contains almost the entirety of the basin of attraction, with the only configurations that are in the basin of attraction but not the target configuration set being the highly unusual configurations of matter containing stars, galaxies, and so on.
For this reason the universe as a whole does not qualify as an optimizing system under our definition. (Or perhaps it would be more accurate to say that it qualifies as an extremely weak optimizing system.)
Power sources and entropy
The second law of thermodynamics tells us that any closed system will eventually tend towards a maximally disordered state in which matter and energy is spread approximately uniformly through space. So if we were to isolate one of the systems explore above inside a sealed chamber and leave it for a very long period then eventually whatever power source we put inside the sealed chamber would become depleted, and then eventually after that every complex material or compound in the system would degrade into its base products, and then finally we would be left with a chamber filled with a uniform gaseous mixture of whatever base elements we originally put in.
So in this sense there are no optimizing systems at all, since any of the systems above evolve towards their target configuration sets only for a finite period of time, after which they degrade and evolve towards a maxentropy configuration.
This is not a very serious challenge to our definition of optimization since it is common throughout physics and computer science to study various “steady-state” or “fixed point” systems even though the same objection could be made about any of them. We say that a thermometer can be used to build a heat regulator that will keep the temperature of a house within a desired range, and we do not usually need to add the caveat that eventually the house and regulator will degrade into a uniform gaseous mixture due to the heat death of the universe.
Nevertheless, two possible ways to refine our definition are:
-
We could stipulate that some power source is provided externally to each system we analyze, and then perform our analysis conditional on the existence of that power source.
-
We could specify a finite time horizon and say that “a system is an optimizing system if it tends towards a target configuration set up to time T”.
Connection to dynamical systems theory
The concept of “optimizing system” in this essay is very close to that of a dynamical system with one or more attractors. We offer the following remarks on this connection.
-
A general dynamical system is any system with a state that evolves over time as a function of the state itself. This encompasses a very broad range of systems indeed!
-
In dynamical system theory, an attractor is the term used for what we have called the target configuration set. A fixed point attractor is, in our language, a target configuration set with just one element, such as when computing the square root of two. A limit cycle is, in our language, a system that eventually stably loops through a sequence of states all of which are in the target configuration set, such as a satellite in orbit.
-
We have discussed systems that evolve towards target configurations along some dimensions but not others (e.g. ball in a valley). We have not yet discovered whether dynamical systems theory explicitly studies attractors that operate along a subset of the system’s dimensions.
-
There is a concept of “well-posedness” in dynamical systems theory that justifies the identification of a mathematical model with a physical system. The conditions for a model to be well-posed are (1) that a solution exists (i.e. the model is not self-contradictory), (2) that there is a unique solution (i.e. the model contains enough information to pick out a single system trajectory), and (3) that the solution changes continuously with the initial conditions (the behavior of the system is not too chaotic). This third condition may present an interesting avenue for future investigation as it seems related to but not quite equivalent to our notion of robustness since robustness as we define it additionally requires that the system continue to evolve towards the same attractor state despite perturbations. Exploring this connection may present an interesting avenue for future investigation.
Conclusion
We have proposed a concept that we call “optimizing systems” to describe systems that have a tendency to evolve towards a narrow target configuration set when started from any point within a broader basin of attraction, and continue to do so despite perturbations.
We have analyzed optimizing systems along three dimensions:
-
Robustness, which measures the number of dimensions along which the system is robust to perturbations, and the magnitude of perturbation along these dimensions that the system can withstand.
-
Duality, which measures the extent to which an approximate “engine of optimization” subsystem can be identified.
-
Retargetability, which measures the extent to which the system can be transformed via microscopic perturbations into an equally robust optimizing system but with a different target configuration set.
We have argued that the “optimizer” concept rests on an assumption that optimizing systems can be decomposed into engine and object of optimization (or agent and environment, or mind and world). We have described systems that do exhibit optimization yet cannot be decomposed this way, such as the tree example. We have also pointed out that, even among those systems that can be decomposed approximately into engine and object of optimization (for example, a robot moving a ball around), we will not in general be able to meaningfully answer the question of whether arbitrary subcomponents of the agent are an optimizer not (c.f. the human liver example).
Therefore, while the “optimizer” concept clearly still has much utility in designing intelligent systems, we should be cautious about taking it as a primitive in our understanding of the world. In particular we should not expect questions of the form “is X an optimizer?” to always have answers.
- Cyborgism by 10 Feb 2023 14:47 UTC; 337 points) (
- Why Agent Foundations? An Overly Abstract Explanation by 25 Mar 2022 23:17 UTC; 302 points) (
- The Plan by 10 Dec 2021 23:41 UTC; 255 points) (
- Utility Maximization = Description Length Minimization by 18 Feb 2021 18:04 UTC; 213 points) (
- Matt Botvinick on the spontaneous emergence of learning algorithms by 12 Aug 2020 7:47 UTC; 154 points) (
- What do coherence arguments actually prove about agentic behavior? by 1 Jun 2024 9:37 UTC; 123 points) (
- What do coherence arguments actually prove about agentic behavior? by 1 Jun 2024 9:37 UTC; 123 points) (
- How would a language model become goal-directed? by 16 Jul 2022 14:50 UTC; 113 points) (EA Forum;
- Voting Results for the 2020 Review by 2 Feb 2022 18:37 UTC; 108 points) (
- Searching for Search by 28 Nov 2022 15:31 UTC; 94 points) (
- Prizes for the 2020 Review by 20 Feb 2022 21:07 UTC; 94 points) (
- Meaning & Agency by 19 Dec 2023 22:27 UTC; 91 points) (
- Unnatural Categories Are Optimized for Deception by 8 Jan 2021 20:54 UTC; 89 points) (
- Optimization at a Distance by 16 May 2022 17:58 UTC; 88 points) (
- 3C’s: A Recipe For Mathing Concepts by 3 Jul 2024 1:06 UTC; 81 points) (
- Literature Review on Goal-Directedness by 18 Jan 2021 11:15 UTC; 80 points) (
- AI takeoff story: a continuation of progress by other means by 27 Sep 2021 15:55 UTC; 76 points) (
- Long-Term Future Fund: July 2021 grant recommendations by 18 Jan 2022 8:49 UTC; 75 points) (EA Forum;
- Optimization Concepts in the Game of Life by 16 Oct 2021 20:51 UTC; 75 points) (
- The “Measuring Stick of Utility” Problem by 25 May 2022 16:17 UTC; 74 points) (
- 2020 Review Article by 14 Jan 2022 4:58 UTC; 74 points) (
- Discovering Agents by 18 Aug 2022 17:33 UTC; 73 points) (
- Abstracting The Hardness of Alignment: Unbounded Atomic Optimization by 29 Jul 2022 18:59 UTC; 68 points) (
- Vingean Agency by 24 Aug 2022 20:08 UTC; 62 points) (
- My take on Michael Littman on “The HCI of HAI” by 2 Apr 2021 19:51 UTC; 59 points) (
- Interpretability’s Alignment-Solving Potential: Analysis of 7 Scenarios by 12 May 2022 20:01 UTC; 58 points) (
- Review of ‘But exactly how complex and fragile?’ by 6 Jan 2021 18:39 UTC; 57 points) (
- Towards a formalization of the agent structure problem by 29 Apr 2024 20:28 UTC; 55 points) (
- Algorithmic Intent: A Hansonian Generalized Anti-Zombie Principle by 14 Jul 2020 6:03 UTC; 50 points) (
- AXRP Episode 4 - Risks from Learned Optimization with Evan Hubinger by 18 Feb 2021 0:03 UTC; 43 points) (
- Mosaic and Palimpsests: Two Shapes of Research by 12 Jul 2022 9:05 UTC; 39 points) (
- Agency from a causal perspective by 30 Jun 2023 17:37 UTC; 39 points) (
- [ASoT] Consequentialist models as a superset of mesaoptimizers by 23 Apr 2022 17:57 UTC; 38 points) (
- 6 Jan 2021 18:40 UTC; 37 points) 's comment on But exactly how complex and fragile? by (
- Selection processes for subagents by 30 Jun 2022 23:57 UTC; 36 points) (
- Epistemic Artefacts of (conceptual) AI alignment research by 19 Aug 2022 17:18 UTC; 31 points) (
- Bits of Optimization Can Only Be Lost Over A Distance by 23 May 2022 18:55 UTC; 31 points) (
- Problems facing a correspondence theory of knowledge by 24 May 2021 16:02 UTC; 30 points) (
- 3 Jan 2021 17:34 UTC; 30 points) 's comment on Selection vs Control by (
- Computational signatures of psychopathy by 19 Dec 2022 17:01 UTC; 29 points) (
- The accumulation of knowledge: literature review by 10 Jul 2021 18:36 UTC; 29 points) (
- [AN #157]: Measuring misalignment in the technology underlying Copilot by 23 Jul 2021 17:20 UTC; 28 points) (
- Epistemic Motif of Abstract-Concrete Cycles & Domain Expansion by 10 Oct 2023 3:28 UTC; 26 points) (
- Pitfalls of the agent model by 27 Apr 2021 22:19 UTC; 25 points) (
- Bridging Expected Utility Maximization and Optimization by 5 Aug 2022 8:18 UTC; 25 points) (
- [AN #105]: The economic trajectory of humanity, and what we might mean by optimization by 24 Jun 2020 17:30 UTC; 24 points) (
- 3 Jun 2024 20:44 UTC; 24 points) 's comment on The Standard Analogy by (
- 19 Nov 2021 15:48 UTC; 22 points) 's comment on Ngo and Yudkowsky on alignment difficulty by (
- Confusions in My Model of AI Risk by 7 Jul 2022 1:05 UTC; 22 points) (
- 9 May 2024 3:45 UTC; 20 points) 's comment on DanielFilan’s Shortform Feed by (
- Sunday July 12 — talks by Scott Garrabrant, Alexflint, alexei, Stuart_Armstrong by 8 Jul 2020 0:27 UTC; 19 points) (
- Motivations, Natural Selection, and Curriculum Engineering by 16 Dec 2021 1:07 UTC; 16 points) (
- What sorts of systems can be deceptive? by 31 Oct 2022 22:00 UTC; 16 points) (
- [AN #164]: How well can language models write code? by 15 Sep 2021 17:20 UTC; 13 points) (
- Sunday September 27, 12:00PM (PT) — talks by Alex Flint, Alex Zhu and more by 22 Sep 2020 21:59 UTC; 11 points) (
- 2 Jun 2024 11:06 UTC; 11 points) 's comment on What do coherence arguments actually prove about agentic behavior? by (
- 22 Jun 2021 17:09 UTC; 11 points) 's comment on I’m no longer sure that I buy dutch book arguments and this makes me skeptical of the “utility function” abstraction by (
- 26 Jun 2020 19:51 UTC; 10 points) 's comment on Risks from Learned Optimization: Conclusion and Related Work by (
- 23 Nov 2021 17:28 UTC; 10 points) 's comment on Ngo and Yudkowsky on alignment difficulty by (
- 5 Aug 2022 20:36 UTC; 10 points) 's comment on The Pragmascope Idea by (
- Some Problems with Ordinal Optimization Frame by 6 May 2024 5:28 UTC; 9 points) (
- 31 Jul 2024 12:09 UTC; 9 points) 's comment on Decomposing Agency — capabilities without desires by (
- 25 Dec 2020 17:14 UTC; 8 points) 's comment on Operationalizing compatibility with strategy-stealing by (
- 3 Jun 2024 19:59 UTC; 8 points) 's comment on Seth Herd’s Shortform by (
- 4 Aug 2024 18:26 UTC; 5 points) 's comment on A Simple Toy Coherence Theorem by (
- A new definition of “optimizer” by 9 Aug 2021 13:42 UTC; 5 points) (
- 29 Sep 2021 14:44 UTC; 4 points) 's comment on Selection Theorems: A Program For Understanding Agents by (
- 15 Aug 2022 5:41 UTC; 4 points) 's comment on Gradient descent doesn’t select for inner search by (
- 21 Jun 2020 20:28 UTC; 4 points) 's comment on Our take on CHAI’s research agenda in under 1500 words by (
- 11 Jan 2023 19:08 UTC; 3 points) 's comment on Dalcy’s Shortform by (
- 19 Dec 2022 15:46 UTC; 3 points) 's comment on Positive values seem more robust and lasting than prohibitions by (
- 3 Jan 2023 23:44 UTC; 3 points) 's comment on My first year in AI alignment by (
- 17 Apr 2021 16:33 UTC; 2 points) 's comment on Defining “optimizer” by (
- 27 Sep 2024 12:56 UTC; 2 points) 's comment on [Intuitive self-models] 2. Conscious Awareness by (
- 29 May 2022 12:42 UTC; 2 points) 's comment on Adversarial attacks and optimal control by (
- 26 May 2021 16:41 UTC; 2 points) 's comment on Knowledge is not just map/territory resemblance by (
- 4 Aug 2021 17:46 UTC; 1 point) 's comment on Re-Define Intent Alignment? by (
- 23 Feb 2024 12:49 UTC; 1 point) 's comment on Difficulty classes for alignment properties by (
- 19 Dec 2022 21:44 UTC; 1 point) 's comment on DragonGod’s Shortform by (
In this post, the author proposes a semiformal definition of the concept of “optimization”. This is potentially valuable since “optimization” is a word often used in discussions about AI risk, and much confusion can follow from sloppy use of the term or from different people understanding it differently. While the definition given here is a useful perspective, I have some reservations about the claims made about its relevance and applications.
The key paragraph, which summarizes the definition itself, is the following:
In fact, “continues to exhibit this tendency with respect to the same target configuration set despite perturbations” is redundant: clearly as long as the perturbation doesn’t push the system out of the basin, the tendency must continue.
This is what is known as “attractor” in dynamical systems theory. For comparison, here is the definition of “attractor” from the Wikipedia:
The author acknowledges this connection, although he also makes the following remark:
I find this remark confusing. An attractor that operates along a subset of the dimension is just an attractor submanifold. This is completely standard in dynamical systems theory.
Given that the definition itself is not especially novel, the post’s main claim to value is via the applications. Unfortunately, some of the proposed applications seem to me poorly justified. Specifically, I want to talk about two major examples: the claimed relationship to embedded agency and the claimed relations to comprehensive AI services.
In both cases, the main shortcoming of the definition is that there is an essential property of AI that this definition doesn’t capture at all. The author does acknowledge that “goal-directed agent system” is a distinct concept from “optimizing systems”. However, he doesn’t explain how are they distinct.
One way to formulate the difference is as follows: agency = optimization + learning. An agent is not just capable of steering a particular universe towards a certain outcome, it is capable of steering an entire class of universes, without knowing in advance in which universe it was placed. This underlies all of RL theory, this is implicit in the Shane-Legg definition of intelligence and my own[1], this is what Yudkowsky calls “cross domain”.
The issue of learning is not just nitpicking, it is crucial to delineate the boundary around “AI risk”, and delineating the boundary is crucial to constructively think of solutions. If we ignore learning and just talk about “optimization risks” then we will have to include the risk of pandemics (because bacteria are optimizing for infection), the risk of false vacuum collapse in particle accelerators (because vacuum bubbles are optimizing for expanding), the risk of runaway global warming (because it is optimizing for increasing temperature) et cetera. But, these are very different risks that require very different solutions.
There is another, less central, difference: the author requires a particular set of “target states” whereas in the context of agency it is more natural to consider utility functions, which means there is a continuous gradation of states rather than just “good states” and “bad states”. This is related to the difference the author points out between his definition and Yudkowsky’s:
The improbability of the final configuration is a continuous metric, whereas just arriving or not arriving at a particular set is discrete.
Let’s see how this shortcoming affects the conclusions. About embedded agency, the author writes:
The correct starting point is “agent”, defined in the way I gestured at above. If instead we start with “optimizing system” then we throw away the baby with the bathwater, since the crucial aspect of learning is ignored. This is an essential property of the embedded agency problem: arguably the entire difficulty is about how can we define learning without introducing unphysical dualism (indeed, I have recently addressed this problem, and “optimizing system” doesn’t seem very helpful there).
About comprehensive AI services:
What is an example of an optimizing AI system that is not agentic? The author doesn’t give such an example and instead talks about trees, which are not AIs. I agree that the class of dangerous systems is substantially wider than the class of systems which were explicitly designed with agency in mind. However, this is precisely because agency can arise from such systems even when not explicitly designed, and moreover this is hard to avoid if the system is to be powerful enough for pivotal acts. This is not because there is some class of “optimizing AI systems” which are intermediate between “agentic” and “non-agentic”.
To summarize, I agree with and encourage the use of tools from dynamical systems theory to study AI. However, one must acknowledge to correct scope of these tools and what they don’t do. Moreover, more work is needed before truly novel conclusions can be obtained by these means.
Modulo issues with traps which I will not go into atm.