Two examples which I’d be interested in your comments on:
1. Consider adding a big black hole in the middle of a galaxy. Does this turn the galaxy into a system optimising for a really big black hole in the middle of the galaxy? (Credit for the example goes to Ramana Kumar).
2. Imagine that I have the goal of travelling as fast as possible. However, there is no set of states which you can point to as the “target states”, since whatever state I’m in, I’ll try to go even faster. This is another argument for, as I argue below, defining an optimising system in terms of increasing some utility function (rather than moving towards target states).
There’s a way of viewing the world as a series of ”forces”, each trying to control the future. Eukaryotic life is one. Black holes are another. We build many things, humans, from chairs to planes to AIs. Of those three, turning on the AI feels the most like “a new force has entered the game”.
All these forces are fighting over the future, and while it’s odd to think of a black hole as an agent, sometimes when I look at it it does feel natural to think of physics as another optimisation force that’s playing the game with us.
Consider adding a big black hole in the middle of a galaxy. Does this turn the galaxy into a system optimising for a really big black hole in the middle of the galaxy?
Yes this would qualify as an optimizing system by my definition. In fact just placing a large planet close to a bunch of smaller planets would qualify as an optimizing system if the eventual result is to collapse the mass of the smaller planets into the larger planet.
This seems to me to be a lot like a ball rolling down a hill: a black hole doesn’t seem alive or agentic, and it doesn’t really respond in any meaningful way to hurdles put in its way, but yes it does qualify as an optimizing system. For this reason my definition isn’t yet a very good definition of what agency is, or what post-agency concept we should adopt. I like Rohin’s comment on how we might view agency in this framework.
Imagine that I have the goal of travelling as fast as possible. However, there is no set of states which you can point to as the “target states”, since whatever state I’m in, I’ll try to go even faster. This is another argument for, as I argue below, defining an optimising system in terms of increasing some utility function (rather than moving towards target states).
Yes it’s true that using a set of target states rather than an ordering over states means that we can’t handle cases where there is a direction of optimization but not a “destination”. But if we use an ordering over states then we run into the following problem: how can we say whether a system is robust to perturbations? Is it just that the system continues to climb the preference gradient despite perturbations? But now every system is an optimizing system, because we can always come up with some preference ordering that explains a system as an optimizing system. So then we can say “well it should be an ordering over states with a compact representation” or “it should be more compact than competing explanations”. This may be okay but it seems quite dicey to me.
It actually seems quite important to me that the definition point to systems that “get back on track” even when you push them around. It may be possible to do this with an ordering over states and I’d love to discuss this more.
But now every system is an optimizing system, because we can always come up with some preference ordering that explains a system as an optimizing system.
Hmmm, I’m a little uncertain about whether this is the case. E.g. suppose you have a box with a rock in it, in an otherwise empty universe. Nothing happens. You perturb the system by moving the rock outside the box. Nothing else happens in response. How would you describe this as an optimising system? (I’m assuming that we’re ruling out the trivial case of a constant utility function; if not, we should analogously include the trivial case of all states being target states).
As a more general comment: I suspect that what starts to happen after you start digging into what “perturbation” means, and what counts as a small or big perturbation, is that you run into the problem that a *tiny* perturbation can transform a highly optimising system to a non-optimising system (e.g. flicking the switch to turn off the AGI). In order to quantify size of perturbations in an interesting way, you need the pre-existing concept of which subsystems are doing the optimisation.
My preferred solution to this is just to stop trying to define optimisation in terms of *outcomes*, and start defining it in terms of *computation* done by systems. E.g. a first attempt might be: an agent is an optimiser if it does planning via abstraction towards some goal. Then we can zoom in on what all these words mean, or what else we might need to include/exclude (in this case, we’ve ruled out evolution, so we probably need to broaden it). The broad philosophy here is that it’s better to be vaguely right than precisely wrong. Unfortunately I haven’t written much about this approach publicly—I briefly defend it in a comment thread on this post though.
FYI: I think something got messed up with this link. The text of the link is a valid url, but it links to a mangled one (s.t. if you click it you get a 404 error).
suppose you have a box with a rock in it, in an otherwise empty universe [...]
Yes you’re right, this system would be described by a constant utility function, and yes this is analogous to the case where the target configuration set contains all configurations, and yes this should not be considered optimization. In the target set formulation, we can measure the degree of optimization by the size of the target set relative to the size of the basin of attraction. In your rock example, the sets have the same size, so it would make sense to say that the degree of optimization is zero.
This discussion is updating me in the direction that a preference ordering formulation is possible, but that we need some analogy for “degree of optimization” that captures how “tight” or “constrained” the system’s evolution is relative to the size of the basin of attraction. We need a way to say that a constant utility function corresponds to a degree of optimization equal to zero. We also need a way to handle the case where our utility function assigns utility proportional to entropy, so again we can describe all physical systems as optimizing systems and thermodynamics ensures that we are correct. This utility function would be extremely flat and wide, with most configurations receiving near-identical utility (since the high entropy configurations constitute the vast majority of all possible configurations). I’m sure there is some way to quantify this—do you know of any appropriate measure?
The challenge here is that in order to actually deal with the case you mentioned originally—the goal of moving as fast as possible—we need a measure that is not based on the size or curvature of some local maxima of the utility function. If we are working with local maxima then we are really still working with systems that evolve towards a specific destination (although there still may be advantages to thinking this way rather than in terms of a binary set).
My preferred solution to this is just to stop trying to define optimisation in terms of outcomes, and start defining it in terms of computation done by systems
But if we use an ordering over states then we run into the following problem: how can we say whether a system is robust to perturbations? Is it just that the system continues to climb the preference gradient despite perturbations? But now every system is an optimizing system, because we can always come up with some preference ordering that explains a system as an optimizing system. So then we can say “well it should be an ordering over states with a compact representation” or “it should be more compact than competing explanations”. This may be okay but it seems quite dicey to me.
Doesn’t the set-of-target-states version have just the same issue (or an analogous one)?
For whatever behavior the system exhibits, I can always say that the states it ends up in were part of its set of target states. So you have to count on compactness (or naturalness of description, which is basically the same thing) of the set of target states for this concept of an optimizing system to be meaningful. No?
Well most system don’t have a tendency to evolve towards any small set of target states despite perturbations. Most systems, if you perturb then, just go off in some different direction. For example, if you perturb most running computer programs by modifying some variable with a debugger, they do not self-correct. Same with the satellite and billiard balls example. Most systems just don’t have this “attractor” dynamic.
Hmm, I see what you’re saying, but there still seems to be an analogy to me here with arbitrary utility functions, where you need the set of target states to be small (as you do say). Otherwise I could just say that the set of target states is all the directions the system might fly off in if you perturb it.
So you might say that, for this version of optimization to be meaningful, the set of target states has to be small (however that’s quantified), and for the utility maximization version to be meaningful, you need the utility function to be simple (however that’s quantified).
EDIT: And actually, maybe the two concepts are sort of dual to each other. If you have an agent with a simple utility function, then you could consider all its local optima to be a (small) set of target states for an optimizing system. And if you have an optimizing system with a small set of target states, then you could easily convert that into a simple utility function with a gradient towards those states.
And if your utility function isn’t simple, maybe you wouldn’t get a small set of target states when you do the conversion, and vice versa?
I’d say the utility function needs to contain one or more local optima with large basins of attraction that contain the initial state, not that the utility function needs to be simple. The simplest possible utility function is a constant function, which allows the system to wander aimlessly and certainly not “correct” in any way for perturbations.
Two examples which I’d be interested in your comments on:
1. Consider adding a big black hole in the middle of a galaxy. Does this turn the galaxy into a system optimising for a really big black hole in the middle of the galaxy? (Credit for the example goes to Ramana Kumar).
2. Imagine that I have the goal of travelling as fast as possible. However, there is no set of states which you can point to as the “target states”, since whatever state I’m in, I’ll try to go even faster. This is another argument for, as I argue below, defining an optimising system in terms of increasing some utility function (rather than moving towards target states).
On the topic of the black hole...
There’s a way of viewing the world as a series of ”forces”, each trying to control the future. Eukaryotic life is one. Black holes are another. We build many things, humans, from chairs to planes to AIs. Of those three, turning on the AI feels the most like “a new force has entered the game”.
All these forces are fighting over the future, and while it’s odd to think of a black hole as an agent, sometimes when I look at it it does feel natural to think of physics as another optimisation force that’s playing the game with us.
Great examples! Thank you.
Yes this would qualify as an optimizing system by my definition. In fact just placing a large planet close to a bunch of smaller planets would qualify as an optimizing system if the eventual result is to collapse the mass of the smaller planets into the larger planet.
This seems to me to be a lot like a ball rolling down a hill: a black hole doesn’t seem alive or agentic, and it doesn’t really respond in any meaningful way to hurdles put in its way, but yes it does qualify as an optimizing system. For this reason my definition isn’t yet a very good definition of what agency is, or what post-agency concept we should adopt. I like Rohin’s comment on how we might view agency in this framework.
Yes it’s true that using a set of target states rather than an ordering over states means that we can’t handle cases where there is a direction of optimization but not a “destination”. But if we use an ordering over states then we run into the following problem: how can we say whether a system is robust to perturbations? Is it just that the system continues to climb the preference gradient despite perturbations? But now every system is an optimizing system, because we can always come up with some preference ordering that explains a system as an optimizing system. So then we can say “well it should be an ordering over states with a compact representation” or “it should be more compact than competing explanations”. This may be okay but it seems quite dicey to me.
It actually seems quite important to me that the definition point to systems that “get back on track” even when you push them around. It may be possible to do this with an ordering over states and I’d love to discuss this more.
Hmmm, I’m a little uncertain about whether this is the case. E.g. suppose you have a box with a rock in it, in an otherwise empty universe. Nothing happens. You perturb the system by moving the rock outside the box. Nothing else happens in response. How would you describe this as an optimising system? (I’m assuming that we’re ruling out the trivial case of a constant utility function; if not, we should analogously include the trivial case of all states being target states).
As a more general comment: I suspect that what starts to happen after you start digging into what “perturbation” means, and what counts as a small or big perturbation, is that you run into the problem that a *tiny* perturbation can transform a highly optimising system to a non-optimising system (e.g. flicking the switch to turn off the AGI). In order to quantify size of perturbations in an interesting way, you need the pre-existing concept of which subsystems are doing the optimisation.
My preferred solution to this is just to stop trying to define optimisation in terms of *outcomes*, and start defining it in terms of *computation* done by systems. E.g. a first attempt might be: an agent is an optimiser if it does planning via abstraction towards some goal. Then we can zoom in on what all these words mean, or what else we might need to include/exclude (in this case, we’ve ruled out evolution, so we probably need to broaden it). The broad philosophy here is that it’s better to be vaguely right than precisely wrong. Unfortunately I haven’t written much about this approach publicly—I briefly defend it in a comment thread on this post though.
FYI: I think something got messed up with this link. The text of the link is a valid url, but it links to a mangled one (s.t. if you click it you get a 404 error).
That’s weird; thanks for the catch. Fixed.
Yes you’re right, this system would be described by a constant utility function, and yes this is analogous to the case where the target configuration set contains all configurations, and yes this should not be considered optimization. In the target set formulation, we can measure the degree of optimization by the size of the target set relative to the size of the basin of attraction. In your rock example, the sets have the same size, so it would make sense to say that the degree of optimization is zero.
This discussion is updating me in the direction that a preference ordering formulation is possible, but that we need some analogy for “degree of optimization” that captures how “tight” or “constrained” the system’s evolution is relative to the size of the basin of attraction. We need a way to say that a constant utility function corresponds to a degree of optimization equal to zero. We also need a way to handle the case where our utility function assigns utility proportional to entropy, so again we can describe all physical systems as optimizing systems and thermodynamics ensures that we are correct. This utility function would be extremely flat and wide, with most configurations receiving near-identical utility (since the high entropy configurations constitute the vast majority of all possible configurations). I’m sure there is some way to quantify this—do you know of any appropriate measure?
The challenge here is that in order to actually deal with the case you mentioned originally—the goal of moving as fast as possible—we need a measure that is not based on the size or curvature of some local maxima of the utility function. If we are working with local maxima then we are really still working with systems that evolve towards a specific destination (although there still may be advantages to thinking this way rather than in terms of a binary set).
Nice—I’d love to hear more about this
Doesn’t the set-of-target-states version have just the same issue (or an analogous one)?
For whatever behavior the system exhibits, I can always say that the states it ends up in were part of its set of target states. So you have to count on compactness (or naturalness of description, which is basically the same thing) of the set of target states for this concept of an optimizing system to be meaningful. No?
Well most system don’t have a tendency to evolve towards any small set of target states despite perturbations. Most systems, if you perturb then, just go off in some different direction. For example, if you perturb most running computer programs by modifying some variable with a debugger, they do not self-correct. Same with the satellite and billiard balls example. Most systems just don’t have this “attractor” dynamic.
Hmm, I see what you’re saying, but there still seems to be an analogy to me here with arbitrary utility functions, where you need the set of target states to be small (as you do say). Otherwise I could just say that the set of target states is all the directions the system might fly off in if you perturb it.
So you might say that, for this version of optimization to be meaningful, the set of target states has to be small (however that’s quantified), and for the utility maximization version to be meaningful, you need the utility function to be simple (however that’s quantified).
EDIT: And actually, maybe the two concepts are sort of dual to each other. If you have an agent with a simple utility function, then you could consider all its local optima to be a (small) set of target states for an optimizing system. And if you have an optimizing system with a small set of target states, then you could easily convert that into a simple utility function with a gradient towards those states.
And if your utility function isn’t simple, maybe you wouldn’t get a small set of target states when you do the conversion, and vice versa?
I’d say the utility function needs to contain one or more local optima with large basins of attraction that contain the initial state, not that the utility function needs to be simple. The simplest possible utility function is a constant function, which allows the system to wander aimlessly and certainly not “correct” in any way for perturbations.
Ah, good points!