When I started writing this comment I was confused. Then I got myself fairly less confused I think. I am going to say a bunch of things to explain my confusion, how I tried to get less confused, and then I will ask a couple questions. This comment got really long, and I may decide that it should be a post instead.
Take a system X with 8 possible states. Imagine X is like a simplified Rubik’s cube type puzzle. (Thinking about mechanical Rubik’s cube solvers is how I originally got confused, but using actual Rubik’s cubes to explain would make the math harder.) Suppose I want to measure the optimization power of two different optimizers that optimize X, and share the following preference ordering:
x1∼x2∼x3∼x4∼x5∼x6<x7<x8
When I let optimizer1 operate on X, optimizer1 always leaves X=x8. So on the first time I give optimizer1 X I get:
OP=log2(8/1)=3
If I give X to optimizer1 a second time I get:
OP(X1)=log2(8/1)=3
OP(X2)=log2(8/1)=3
OP=log2(64/1)=OP(X1)+OP(X2)=6
This seems a bit weird to me. If we are imagining a mechanical robot with a camera that solves a Rubik’s cube like puzzle, it seems weird to say that the solver gets stronger if I let it operate on the puzzle twice. I guess this would make sense for a measure of optimization pressure exerted instead of a measure of the power of the system, but that doesn’t seem to be what the post was going for exactly. I guess we could fix this by dividing by the number of times we give optimizer1 X, and then we would get 3 no matter how many times we let optimizer1 operate on X. This would avoid the weird result that a mechanical puzzle solver gets more powerful the more times we let it operate on the puzzle.
Say that when I let optimizer2 operate on X, it leaves X=x7 with probability p, and leaves X=x8 with probability 1−p, but I do not know p. If I let optimizer2 operate on X one time, and I observe X=x7, I get:
OP=log2(8/2)=2
If I let optimizer2 operate on X three times, and I observe X1=x7, X2=x7, X3=x8, then I get:
OP(X1)=log2(8/2)=2
OP(X2)=log2(8/2)=2
OP(X3)=log2(8/1)=3
OP=log2(512/4)=OP(X1)+OP(X2)+OP(X3)=7
Now we could use the same trick we used before and divide by the number of instances on which optimizer2 was allowed to exert optimization pressure, and this would give us 7⁄3. The thing is though that we do not know p and it seems like p is relevant to how strong optimizer2 is. We can estimate p to be 2⁄5 using Laplace’s rule, but it might be that the long run frequency of times that optimizer2 leaves X=x8 is actually .9999 and we just got unlucky. (I’m not a frequentist, long run frequency just seemed like the closest concept. Feel free to replace “long run frequency” with the prob a solomonoff bot using the correct language assigns at the limit, or anything else reasonable.) If the long run frequency is in fact that large, then it seems like we are underestimating the power of optimizer2 just because we got a bad sample of its performance. The higher p is the more we are underestimating optimizer2 when we measure its power from these observations.
So it seems then like there is another thing that we need to know besides the preference ordering of an optimizer, the measure over the target system in the absence of optimization, and the observed state of the target system, in order to perfectly measure the optimization power of an optimizer. In this case, it seems like we need to know p. This is a pretty easy fix, we can just take the expectation of the optimization power as originally defined with respect to the probability of observing that state when the optimizer is present, but it is seem more complicated, and it is different.
With o being the observed outcome, Ubeing the utility function of the optimization process, and P being the distribution over outcomes in the absence of optimization, I took the definition in the original post to be:
That is, you take the expectation of the original measure with respect to the distribution over outcomes you expect to observe in the presence of optimization. We could then call the original measure “optimization pressure exerted”, and the second measure optimization power. For systems that are only allowed to optimize once, like humans, these values are very similar; for systems that might exert their full optimization power on several occasions depending on circumstance, like Rubik’s cube solvers, these values will be different insofar as the system is allowed to optimize several times. We can think of the first measure as measuring the actual amount of optimization pressure that was exerted on the target system on a particular instance, and we can think of the second measure as the expected amount of optimization pressure that the optimizer exerts on the target system.
To hammer the point home, there is the amount of optimization pressure that I in fact exerted on the universe this time around. Say it was a trillion bits. Then there is the expected amount of optimization pressure that I exert on the universe in a given life. Maybe I just got lucky (or unlucky) on this go around. It could be that if you reran the universe from the point at which I was born several times while varying some things that seem irrelevant, I would on average only increase the negentropy of variables I care about by a million bits. If that were the case, then using the amount of optimization pressure that I exerted on this go around as an estimate of my optimization power in general would be a huge underestimate.
Ok, so what’s up here? This seems like an easy thing to notice, and I’m sure Eliezer noticed it.
Eliezer talks about how from the perspective of deep blue, it is exerting optimization pressure every time it plays a game, but from the perspective of the programmers, creating deep blue was a one time optimization cost. Is that a different way to cache out the same thing? It still seems weird to me to say that the more times deep blue plays chess, the higher its optimization power is. It does not seem weird to me to say that the more times a human plays chess, the higher its optimization power is. Each chess game is a subsystem of the target system of that human, eg, the environment over time. Whereas it does seem weird to me to say that if you uploaded my brain and let my brain operate on the same universe 100 times, that the optimization power of my uploaded brain would be 100 times greater than if you only did this once.
This is a consequence of one of the nice properties of Eliezer’s measure: OP sums for independent systems. It makes sense that if I think an optimizer is optimizing two independent systems, then when I measure their OP with respect to the first system and add it to their OP with respect to the second, I should get the same answer I would if I were treating the two systems jointly as one system. The Rubik’s cube the first time I give it to a mechanical Rubik’s cube solver, and the second time I give it to a mechanical Rubik’s cube solver are in fact two such independent systems. So are the first time you simulate the universe after my birth and the second time. It makes sense to me that you should sum my optimization power for independent parts of the universe in a particular go around should sum to my optimization power with respect to the two systems taken jointly as one, but it doesn’t make sense to me that you should just add the optimization pressure I exert on each go to get my total optimization power. Does the measure I propose here actually sum nicely with respect to independent systems? It seems like it might, but I’m not sure.
Is this just the same as Eliezer’s proposal for measuring optimization power for mixed outcomes? Seems pretty different, but maybe it isn’t. Maybe this is another way to extend optimization power to mixed outcomes? It does take into account that the agent might not take an action that guarantees an outcome with certainty.
Is there some way that I am confused or missing something in the original post that it seems like I am not aware of?
When I started writing this comment I was confused. Then I got myself fairly less confused I think. I am going to say a bunch of things to explain my confusion, how I tried to get less confused, and then I will ask a couple questions. This comment got really long, and I may decide that it should be a post instead.
Take a system X with 8 possible states. Imagine X is like a simplified Rubik’s cube type puzzle. (Thinking about mechanical Rubik’s cube solvers is how I originally got confused, but using actual Rubik’s cubes to explain would make the math harder.) Suppose I want to measure the optimization power of two different optimizers that optimize X, and share the following preference ordering:
x1∼x2∼x3∼x4∼x5∼x6<x7<x8
When I let optimizer1 operate on X, optimizer1 always leaves X=x8. So on the first time I give optimizer1 X I get:
OP=log2(8/1)=3
If I give X to optimizer1 a second time I get:
OP(X1)=log2(8/1)=3
OP(X2)=log2(8/1)=3
OP=log2(64/1)=OP(X1)+OP(X2)=6
This seems a bit weird to me. If we are imagining a mechanical robot with a camera that solves a Rubik’s cube like puzzle, it seems weird to say that the solver gets stronger if I let it operate on the puzzle twice. I guess this would make sense for a measure of optimization pressure exerted instead of a measure of the power of the system, but that doesn’t seem to be what the post was going for exactly. I guess we could fix this by dividing by the number of times we give optimizer1 X, and then we would get 3 no matter how many times we let optimizer1 operate on X. This would avoid the weird result that a mechanical puzzle solver gets more powerful the more times we let it operate on the puzzle.
Say that when I let optimizer2 operate on X, it leaves X=x7 with probability p, and leaves X=x8 with probability 1−p, but I do not know p. If I let optimizer2 operate on X one time, and I observe X=x7, I get:
OP=log2(8/2)=2
If I let optimizer2 operate on X three times, and I observe X1=x7, X2=x7, X3=x8, then I get:
OP(X1)=log2(8/2)=2
OP(X2)=log2(8/2)=2
OP(X3)=log2(8/1)=3
OP=log2(512/4)=OP(X1)+OP(X2)+OP(X3)=7
Now we could use the same trick we used before and divide by the number of instances on which optimizer2 was allowed to exert optimization pressure, and this would give us 7⁄3. The thing is though that we do not know p and it seems like p is relevant to how strong optimizer2 is. We can estimate p to be 2⁄5 using Laplace’s rule, but it might be that the long run frequency of times that optimizer2 leaves X=x8 is actually .9999 and we just got unlucky. (I’m not a frequentist, long run frequency just seemed like the closest concept. Feel free to replace “long run frequency” with the prob a solomonoff bot using the correct language assigns at the limit, or anything else reasonable.) If the long run frequency is in fact that large, then it seems like we are underestimating the power of optimizer2 just because we got a bad sample of its performance. The higher p is the more we are underestimating optimizer2 when we measure its power from these observations.
So it seems then like there is another thing that we need to know besides the preference ordering of an optimizer, the measure over the target system in the absence of optimization, and the observed state of the target system, in order to perfectly measure the optimization power of an optimizer. In this case, it seems like we need to know p. This is a pretty easy fix, we can just take the expectation of the optimization power as originally defined with respect to the probability of observing that state when the optimizer is present, but it is seem more complicated, and it is different.
With o being the observed outcome, Ubeing the utility function of the optimization process, and P being the distribution over outcomes in the absence of optimization, I took the definition in the original post to be:
log2(∑i∈{A|U(Ai)≥U(o)}P(Ai))
The definition I am proposing instead is:
EP(o|optimizer)[log2(∑i∈{A|U(Ai)≥U(o)}P(Ai|∼optimizer))]
That is, you take the expectation of the original measure with respect to the distribution over outcomes you expect to observe in the presence of optimization. We could then call the original measure “optimization pressure exerted”, and the second measure optimization power. For systems that are only allowed to optimize once, like humans, these values are very similar; for systems that might exert their full optimization power on several occasions depending on circumstance, like Rubik’s cube solvers, these values will be different insofar as the system is allowed to optimize several times. We can think of the first measure as measuring the actual amount of optimization pressure that was exerted on the target system on a particular instance, and we can think of the second measure as the expected amount of optimization pressure that the optimizer exerts on the target system.
To hammer the point home, there is the amount of optimization pressure that I in fact exerted on the universe this time around. Say it was a trillion bits. Then there is the expected amount of optimization pressure that I exert on the universe in a given life. Maybe I just got lucky (or unlucky) on this go around. It could be that if you reran the universe from the point at which I was born several times while varying some things that seem irrelevant, I would on average only increase the negentropy of variables I care about by a million bits. If that were the case, then using the amount of optimization pressure that I exerted on this go around as an estimate of my optimization power in general would be a huge underestimate.
Ok, so what’s up here? This seems like an easy thing to notice, and I’m sure Eliezer noticed it.
Eliezer talks about how from the perspective of deep blue, it is exerting optimization pressure every time it plays a game, but from the perspective of the programmers, creating deep blue was a one time optimization cost. Is that a different way to cache out the same thing? It still seems weird to me to say that the more times deep blue plays chess, the higher its optimization power is. It does not seem weird to me to say that the more times a human plays chess, the higher its optimization power is. Each chess game is a subsystem of the target system of that human, eg, the environment over time. Whereas it does seem weird to me to say that if you uploaded my brain and let my brain operate on the same universe 100 times, that the optimization power of my uploaded brain would be 100 times greater than if you only did this once.
This is a consequence of one of the nice properties of Eliezer’s measure: OP sums for independent systems. It makes sense that if I think an optimizer is optimizing two independent systems, then when I measure their OP with respect to the first system and add it to their OP with respect to the second, I should get the same answer I would if I were treating the two systems jointly as one system. The Rubik’s cube the first time I give it to a mechanical Rubik’s cube solver, and the second time I give it to a mechanical Rubik’s cube solver are in fact two such independent systems. So are the first time you simulate the universe after my birth and the second time. It makes sense to me that you should sum my optimization power for independent parts of the universe in a particular go around should sum to my optimization power with respect to the two systems taken jointly as one, but it doesn’t make sense to me that you should just add the optimization pressure I exert on each go to get my total optimization power. Does the measure I propose here actually sum nicely with respect to independent systems? It seems like it might, but I’m not sure.
Is this just the same as Eliezer’s proposal for measuring optimization power for mixed outcomes? Seems pretty different, but maybe it isn’t. Maybe this is another way to extend optimization power to mixed outcomes? It does take into account that the agent might not take an action that guarantees an outcome with certainty.
Is there some way that I am confused or missing something in the original post that it seems like I am not aware of?