Like, I’ll also be less surprised, but the definition using priors seems non-fundamental in the same way that a definition using priors of atoms would be non-fundamental. I imagine the following dialogue
Me: What is an atom?
You: You can tell whether or not an atom is happening by ranking the worlds based on how much the hypothesis ‘an atom is happening’ predicts those worlds.
Me: Ok, something feels off about that definition. What about, how can I tell which parts of the world are atoms?
You: You can tell which parts of the world are atoms by computing how well the hypothesis ‘an atom is happening’ predicts the world after you replace various sections of the world with random noise. The smallest section of the world which when randomized reduces the probability of the hypothesis ‘an atom is happening’ to zero is what we call an atom.
Me: That still seems weird, and I don’t actually know if you’re going to be able to develop an atomic theory based off that definition.
Okay, that’s an interesting comparison. Maybe this will help; Yudkowsky’s measure of optimization is a measure, like of how much it’s happening, rather than the definition. Then the definition is “when a system’s state moves up an ordering”. Analogously, objects have length, and you can tell “how much of an object” there is by how long it is. And if there’s no object, then it will have zero length. But that doesn’t make the definition of “object” be “a thing that has length”. Does that make sense?
Yudkowsky’s measure still feels weird to me in ways that don’t seem to apply to length, in the sense that length feels much more to me like a measure of territory-shaped things, and Yudkowsky’s measure of optimization power seems much more map-shaped (which I think Garrett did a good job of explicating). Here’s how I would phrase it:
Yudkowsky wants to measure optimization power relative to a utility function: take the rank of the state you’re in, take the total number of all states that have equal or greater rank, and then divide that by the total number of possible states. There are two weird things about this measure, in my opinion. The first is that it’s behaviorist (what I think Garrett was getting at about distinguishing between atom and non-atom worlds). The second is that it seems like a tricky problem to coherently talk about “all possible states.”
So, like, let’s say that we have two buttons next to each other. Press one button and get the world that maxes out your utility function. Press the other and, I don’t know, you get a taco. According to Yudkowsky’s measure, pressing one of these buttons is evidence of vastly more optimization power than the other even though, intuitively, these seem about “equally hard” from the agents perspective.
This is what I mean about it being “behaviorist”—with this measure you only care about which world state attains (and how well that state ranks), but not how you got to that state. It seems clear to me that both of these are relevant in measuring optimization power. Like, conditioned on certain environments some things become vastly easier or harder. Getting a taco is easy in Berkeley, getting a taco is hard in a desert. And if your valuation of taco utility doesn’t change, then your optimization power can end up being largely a function of your environment, and that feels… a bit weird?
On the flip side, it’s also weird that it can vary so much based on the utility function. If someone is maximally happy watching TV at home all of the time, I feel hesitant to say that they have a ton of optimization power?
The thing that feels lacking in both of these cases, to me, is the ability to talk about how hard these goals are to achieve in reality (as a function of agent and environment). Because the difficulty of achieving the same world state can vary dramatically based on the environment and the agent. Grabbing a water bottle is trivial if there is one next to me, grabbing one if I have to construct it out of thermodynamic equilibrium is vastly harder. And importantly, the difference here isn’t in my utility function, but in how the environment shapes the difficulty of my goals, and in my ability as an agent to do these different things. I would like to say that the former uses less optimization power than the latter, and that this is in part a function of the territory.
You can perhaps rescue this by using a non-uniform prior over “all possible states,” and talk about how many bits it takes to move from that distribution to the distribution we want. So like, when I’m in the desert, the state “have a taco” is less likely than when I’m in Berkeley, therefore it takes more optimization power to get there. But then we run into some other problems.
The first is what Garrett points out, that probabilities are map things, and it’s a bit… weird for our measure of a (presumably) territory thing to be dependent on them. It’s the same sort of trickiness that I don’t feel we’ve properly sorted out in thermodynamics—namely, that if we take the existence of macrostates to be reflections of our uncertainty (as Jaynes does), then it seems we are stuck saying something to the effect of “ice cubes melt because we become more uncertain of their state,” which seems… wrong.
The second is that I claim that figuring out the “default” distribution is the entire problem, basically. Like, how do I know that a taco appearing in the desert is less likely than it is in Berkeley? How do I know what grabbing a bottle is more likely when there is a bottle rather than an equilibrium soup? Constructing the “correct” distribution, to the extent that makes sense, over the default outcomes seems to me to be the entire problem of figuring out what makes some tasks easier or harder, which is close to what we were trying to measure in the first place.
I do expect there is a way to talk about the correct default distribution, but that it’s tricky, and that part of why it’s so tricky is because it’s a function of both map and territory shaped things. In any case, I don’t think you get a sensible measure of optimization or other agency-terms if you can’t talk about them as things-in-the-territory (which neither of these measures really do); I’d really like to be able to. I also agree that an explanation (or measure) of atoms as Garrett laid out is unsatisfying; I feel unsatisfied here too, for similar reasons.
Small note: Yudkowsky definition is about a preference order not a utility function. Indeed, this was half the reason we did the project in the first place !
The first is what Garrett points out, that probabilities are map things, and it’s a bit… weird for our measure of a (presumably) territory thing to be dependent on them. It’s the same sort of trickiness that I don’t feel we’ve properly sorted out in thermodynamics—namely, that if we take the existence of macrostates to be reflections of our uncertainty (as Jaynes does), then it seems we are stuck saying something to the effect of “ice cubes melt because we become more uncertain of their state,” which seems… wrong.
For this part, my answer is Kolmogorov complexity. An ice cube has lower K-complexity than the same amount of liquid water, which is a fact about the territory and not our maps. (And if a state has lower K-complexity, it’s more knowable; you can observe fewer bits, and predict more of the state.)
One of my ongoing threads is trying to extend this to optimization. I think a system is being objectively optimized if the state’s K-complexity is being reduced. But I’m still working through the math.
Yeah… so these are reasonable thoughts of the kind that I thought through a bunch when working on this project, and I do think they’re resolvable, but to do so I’d basically be writing out my optimization sequence.
I agree with Alexander below though, a key part of optimization is that it is not about utility functions, it is only about a preference ordering. Utility functions are about choosing between lotteries, which is a thing that agents do, whereas optimization is just about going up an ordering. Optimization is a thing that a whole system does, which is why there’s no agent/environment distinction. Sometimes, only a part of the system is responsible for the optimization, and in that case you can start to talk about separating them, and then you can ask questions about what that part would do if it were placed in other environments.
Say I’ve built a room-tidying robot, and I want to measure its optimisation power. The room can be in two states: tidy or untidy. A natural choice of default distribution p is my beliefs about how tidy the room will be if I don’t put the robot in it. Let’s assume I’m pretty knowledgeable and I’m extremely confident that in that case the room will be untidy: p(untidy)=2047/2048 and p(tidy)=1/2048 (we do have to avoid probabilities of 0, but that’s standard in a Bayesian context). But really I do put the robot in and it gets the room tidy, for an optimisation power of −log12048=11 bits.
That 11 bits doesn’t come from any uncertainty on my part about the optimisation process, although it does depend on my uncertainty about what would happen in the counterfactual world where I don’t put the robot in the room. But becoming more confident that the room would be untidy in that world makes me see the robot as more of an optimiser.
Unlike in information theory, these bits aren’t measuring a resolution of uncertainty, but a difference between the world and a counterfactual.
I don’t see the difference between “resolution of uncertainty” and “difference between the world and a counterfactual.” To my mind, resolution of uncertainty is reducing the space of counterfactuals, e.g., if I’m not sure whether you’ll say yes or no, then you saying “yes” reduces my uncertainty by one bit, because there were two counterfactuals.
I think what Garrett is gesturing at here is more like “There is just one way the world goes, the robot cleans the room or it doesn’t. If I had all the information about the world, I would see the robot does clean the room, i.e., I would have no uncertainty about this, and therefore there is no relevant counterfactual. It’s not as if the robot could have not cleaned the room, I know it doesn’t. In other words, as I gain information about the world, the distance between counterfactual worlds and actual worlds grows smaller, and then so does… the optimization power? That’s weird.”
Like, we want to talk about optimization power here as “moving the world more into your preference ordering, relative to some baseline” but the baseline is made out of counterfactuals, and those live in the mind. So we end up saying something in the vicinity of optimization power being a function of maps, which seems weird to me.
The above formulas rely on comparing the actual world to a fixed counterfactual baseline. Gaining more information about the actual world might make the distance between the counterfactual baseline and the actual world grow smaller, but it also might make it grow bigger, so it’s not the case that the optimisation power goes to zero as my uncertainty about the world decreases. You can play with the formulas and see.
But maybe your objection is not so much that the formulas actually spit out zero, but that if I become very confident about what the world is like, it stops being coherent to imagine it being different? This would be a general argument against using counterfactuals to define anything. I’m not convinced of it, but if you like you can purge all talk of imagining the world being different, and just say that measuring optimisation power requires a controlled experiment: set up the messy room, record what happens when you put the robot in it, set the room up the same, and record what happens with no robot.
But then if you know everything, then nothing ever will be an optimizer!
Like, I’ll also be less surprised, but the definition using priors seems non-fundamental in the same way that a definition using priors of atoms would be non-fundamental. I imagine the following dialogue
Me: What is an atom?
You: You can tell whether or not an atom is happening by ranking the worlds based on how much the hypothesis ‘an atom is happening’ predicts those worlds.
Me: Ok, something feels off about that definition. What about, how can I tell which parts of the world are atoms?
You: You can tell which parts of the world are atoms by computing how well the hypothesis ‘an atom is happening’ predicts the world after you replace various sections of the world with random noise. The smallest section of the world which when randomized reduces the probability of the hypothesis ‘an atom is happening’ to zero is what we call an atom.
Me: That still seems weird, and I don’t actually know if you’re going to be able to develop an atomic theory based off that definition.
Okay, that’s an interesting comparison. Maybe this will help; Yudkowsky’s measure of optimization is a measure, like of how much it’s happening, rather than the definition. Then the definition is “when a system’s state moves up an ordering”. Analogously, objects have length, and you can tell “how much of an object” there is by how long it is. And if there’s no object, then it will have zero length. But that doesn’t make the definition of “object” be “a thing that has length”. Does that make sense?
Yudkowsky’s measure still feels weird to me in ways that don’t seem to apply to length, in the sense that length feels much more to me like a measure of territory-shaped things, and Yudkowsky’s measure of optimization power seems much more map-shaped (which I think Garrett did a good job of explicating). Here’s how I would phrase it:
Yudkowsky wants to measure optimization power relative to a utility function: take the rank of the state you’re in, take the total number of all states that have equal or greater rank, and then divide that by the total number of possible states. There are two weird things about this measure, in my opinion. The first is that it’s behaviorist (what I think Garrett was getting at about distinguishing between atom and non-atom worlds). The second is that it seems like a tricky problem to coherently talk about “all possible states.”
So, like, let’s say that we have two buttons next to each other. Press one button and get the world that maxes out your utility function. Press the other and, I don’t know, you get a taco. According to Yudkowsky’s measure, pressing one of these buttons is evidence of vastly more optimization power than the other even though, intuitively, these seem about “equally hard” from the agents perspective.
This is what I mean about it being “behaviorist”—with this measure you only care about which world state attains (and how well that state ranks), but not how you got to that state. It seems clear to me that both of these are relevant in measuring optimization power. Like, conditioned on certain environments some things become vastly easier or harder. Getting a taco is easy in Berkeley, getting a taco is hard in a desert. And if your valuation of taco utility doesn’t change, then your optimization power can end up being largely a function of your environment, and that feels… a bit weird?
On the flip side, it’s also weird that it can vary so much based on the utility function. If someone is maximally happy watching TV at home all of the time, I feel hesitant to say that they have a ton of optimization power?
The thing that feels lacking in both of these cases, to me, is the ability to talk about how hard these goals are to achieve in reality (as a function of agent and environment). Because the difficulty of achieving the same world state can vary dramatically based on the environment and the agent. Grabbing a water bottle is trivial if there is one next to me, grabbing one if I have to construct it out of thermodynamic equilibrium is vastly harder. And importantly, the difference here isn’t in my utility function, but in how the environment shapes the difficulty of my goals, and in my ability as an agent to do these different things. I would like to say that the former uses less optimization power than the latter, and that this is in part a function of the territory.
You can perhaps rescue this by using a non-uniform prior over “all possible states,” and talk about how many bits it takes to move from that distribution to the distribution we want. So like, when I’m in the desert, the state “have a taco” is less likely than when I’m in Berkeley, therefore it takes more optimization power to get there. But then we run into some other problems.
The first is what Garrett points out, that probabilities are map things, and it’s a bit… weird for our measure of a (presumably) territory thing to be dependent on them. It’s the same sort of trickiness that I don’t feel we’ve properly sorted out in thermodynamics—namely, that if we take the existence of macrostates to be reflections of our uncertainty (as Jaynes does), then it seems we are stuck saying something to the effect of “ice cubes melt because we become more uncertain of their state,” which seems… wrong.
The second is that I claim that figuring out the “default” distribution is the entire problem, basically. Like, how do I know that a taco appearing in the desert is less likely than it is in Berkeley? How do I know what grabbing a bottle is more likely when there is a bottle rather than an equilibrium soup? Constructing the “correct” distribution, to the extent that makes sense, over the default outcomes seems to me to be the entire problem of figuring out what makes some tasks easier or harder, which is close to what we were trying to measure in the first place.
I do expect there is a way to talk about the correct default distribution, but that it’s tricky, and that part of why it’s so tricky is because it’s a function of both map and territory shaped things. In any case, I don’t think you get a sensible measure of optimization or other agency-terms if you can’t talk about them as things-in-the-territory (which neither of these measures really do); I’d really like to be able to. I also agree that an explanation (or measure) of atoms as Garrett laid out is unsatisfying; I feel unsatisfied here too, for similar reasons.
Small note: Yudkowsky definition is about a preference order not a utility function. Indeed, this was half the reason we did the project in the first place !
For this part, my answer is Kolmogorov complexity. An ice cube has lower K-complexity than the same amount of liquid water, which is a fact about the territory and not our maps. (And if a state has lower K-complexity, it’s more knowable; you can observe fewer bits, and predict more of the state.)
One of my ongoing threads is trying to extend this to optimization. I think a system is being objectively optimized if the state’s K-complexity is being reduced. But I’m still working through the math.
Yeah… so these are reasonable thoughts of the kind that I thought through a bunch when working on this project, and I do think they’re resolvable, but to do so I’d basically be writing out my optimization sequence.
I agree with Alexander below though, a key part of optimization is that it is not about utility functions, it is only about a preference ordering. Utility functions are about choosing between lotteries, which is a thing that agents do, whereas optimization is just about going up an ordering. Optimization is a thing that a whole system does, which is why there’s no agent/environment distinction. Sometimes, only a part of the system is responsible for the optimization, and in that case you can start to talk about separating them, and then you can ask questions about what that part would do if it were placed in other environments.
Hm, I’m not sure this problem comes up.
Say I’ve built a room-tidying robot, and I want to measure its optimisation power. The room can be in two states: tidy or untidy. A natural choice of default distribution p is my beliefs about how tidy the room will be if I don’t put the robot in it. Let’s assume I’m pretty knowledgeable and I’m extremely confident that in that case the room will be untidy: p(untidy)=2047/2048 and p(tidy)=1/2048 (we do have to avoid probabilities of 0, but that’s standard in a Bayesian context). But really I do put the robot in and it gets the room tidy, for an optimisation power of −log12048=11 bits.
That 11 bits doesn’t come from any uncertainty on my part about the optimisation process, although it does depend on my uncertainty about what would happen in the counterfactual world where I don’t put the robot in the room. But becoming more confident that the room would be untidy in that world makes me see the robot as more of an optimiser.
Unlike in information theory, these bits aren’t measuring a resolution of uncertainty, but a difference between the world and a counterfactual.
I don’t see the difference between “resolution of uncertainty” and “difference between the world and a counterfactual.” To my mind, resolution of uncertainty is reducing the space of counterfactuals, e.g., if I’m not sure whether you’ll say yes or no, then you saying “yes” reduces my uncertainty by one bit, because there were two counterfactuals.
I think what Garrett is gesturing at here is more like “There is just one way the world goes, the robot cleans the room or it doesn’t. If I had all the information about the world, I would see the robot does clean the room, i.e., I would have no uncertainty about this, and therefore there is no relevant counterfactual. It’s not as if the robot could have not cleaned the room, I know it doesn’t. In other words, as I gain information about the world, the distance between counterfactual worlds and actual worlds grows smaller, and then so does… the optimization power? That’s weird.”
Like, we want to talk about optimization power here as “moving the world more into your preference ordering, relative to some baseline” but the baseline is made out of counterfactuals, and those live in the mind. So we end up saying something in the vicinity of optimization power being a function of maps, which seems weird to me.
The above formulas rely on comparing the actual world to a fixed counterfactual baseline. Gaining more information about the actual world might make the distance between the counterfactual baseline and the actual world grow smaller, but it also might make it grow bigger, so it’s not the case that the optimisation power goes to zero as my uncertainty about the world decreases. You can play with the formulas and see.
But maybe your objection is not so much that the formulas actually spit out zero, but that if I become very confident about what the world is like, it stops being coherent to imagine it being different? This would be a general argument against using counterfactuals to define anything. I’m not convinced of it, but if you like you can purge all talk of imagining the world being different, and just say that measuring optimisation power requires a controlled experiment: set up the messy room, record what happens when you put the robot in it, set the room up the same, and record what happens with no robot.