Consider the following world: a set P of past nodes, a set of environment nodes E a set of actor/agent nodes A, and a set of future nodes F. Consider our previously defined function Op(A;p,f) for p∈P,f∈F, which represents how much A is optimizing the value of f with respect to p. For some choices of A and f, we might find that the values Op(A;p,f) allow us to split the members of P into two categories: those for which Op≫0, those with Op≈0, and those with Op≪0.
Picking P Apart
We can define Vi={p∈P∣Op(A;p,fi)≫0}. In other words, Vi is the subset of P which has no local influence on the future node fi whatsoever. If you want to be cute you can think of Vi as the victim of A with respect to fi, because it is causally erased from existence.
Remember that Op is high when “freezing” the information going out of A means that the value of fi depends much more on the value p. What does it mean when “freezing” the information of A does the opposite?
Now lets define Ui={p∈P∣Op(A;p,fi)≪0}. This means that when we freeze A, these nodes have much less influence on the future. Their influence flows throughA. We can call Ui the utility-function-like region of P with respect to fi. We might also think of the actions of A as amplifying the influence of Ui on fi.
For completeness we’ll define Xi={p∈P∣Op(A;p,fi)≈0}≡P−U−V.
Let’s now define U=U0∪...∪Un i.e. the set of points which are utility-like for any future point.V=V0∪...∪Vm will be the set of points which are victim-like for any future points. We might want to define V′=V0∩...∩Vm as the set of “total” victims and U′=U0∩...∩Um similarly.X=P−U−V meaning X only contains nodes which really don’t interact with A at all in an optimizing/amplifying capacity.
We can also think of U and V as the sets of past nodes which have an outsizedly large and small influence on the future as a result of the actions of A, respectively.
Describing D
D={d∈E∪F∣∃v∈V∣Op(A;v,d)≫0}, in other words D is the region of E∪F in which A is removing the influence of V. We can even define a smaller set D′={d∈E∪F∣∀v∈V′Op(A;v,d)≫0}, in other words the region in which A is totally removing the (local) influence of the total victim nodes V′. We can call D the domain of A.
Approaching A
One issue which we haven’t dealt with is how to actually pick the set A! We sort of arbitrarily declared it to be relevant at the start with no indication as to how we would do it.
There are lots of ways to pick out sets of nodes in a causal network. One involves thinking about minimum information-flow in a very John Wentworth-ish way. Another might be to just start with a very large choice of A and iteratively try moving nodes from A to E. If this shrinks D by a lot then this node might be important for A, otherwise it might not be. This might not always be true though!
Perhaps sensible choices of A are ones which make V and V′ more similar, and U and U′ more similar i.e. those for which nodes in P are cleanly split into utility-like and victim-like nodes.
Either way, it seems like we can probably find ways to find agents in a system.
Conjectures!
I conjecture that if A is a powerful optimizer, this will be expressed by having a large set D. It will also be expressed by having the values of Op(A;v,d) be large.
I conjecture that V, and especially V′ nodes get a pretty rough deal, and that it is bad to be a person living in V′.
I conjecture that the only nodes which get a good deal out of the actions of A are in U, and that for an AI to be aligned, U needs to contain the AI’s creator(s).
How does this help us think about optimizers?
The utility-function framework has not always been a great way to think about things. The differential optimization framework might be better, or failing that, different. The phrase “utility function” often implies a function which explicitly maps [world]→R and is explicitly represented in code. This framework defines U in a different way: as the set of regions of the past which have an oversized influence on the future via an optimizer A.
Thinking about the number of nodes in V and D (and how well the latter are optimized with respect to the former) also provides a link to Eliezer’s definition of optimization power found here. The more nodes in V, the more information we are at liberty to discard; the more nodes in D, the more of the world we can still predict; and the more heavily optimized the nodes of D, the smaller our loss of predictive power.
Differential Optimization Reframes and Generalizes Utility-Maximization
Consider the following world: a set P of past nodes, a set of environment nodes E a set of actor/agent nodes A, and a set of future nodes F. Consider our previously defined function Op(A; p, f) for p∈P,f∈F, which represents how much A is optimizing the value of f with respect to p. For some choices of A and f, we might find that the values Op(A; p, f) allow us to split the members of P into two categories: those for which Op≫0, those with Op≈0, and those with Op≪0.
Picking P Apart
We can define Vi={p∈P∣Op(A; p,fi)≫0}. In other words, Vi is the subset of P which has no local influence on the future node fi whatsoever. If you want to be cute you can think of Vi as the victim of A with respect to fi, because it is causally erased from existence.
Remember that Op is high when “freezing” the information going out of A means that the value of fi depends much more on the value p. What does it mean when “freezing” the information of A does the opposite?
Now lets define Ui={p∈P∣Op(A; p, fi)≪0}. This means that when we freeze A, these nodes have much less influence on the future. Their influence flows through A. We can call Ui the utility-function-like region of P with respect to fi. We might also think of the actions of A as amplifying the influence of Ui on fi.
For completeness we’ll define Xi={p∈P∣Op(A; p, fi)≈0}≡P−U−V.
Let’s now define U=U0∪...∪Un i.e. the set of points which are utility-like for any future point.V=V0∪...∪Vm will be the set of points which are victim-like for any future points. We might want to define V′=V0∩...∩Vm as the set of “total” victims and U′=U0∩...∩Um similarly.X=P−U−V meaning X only contains nodes which really don’t interact with A at all in an optimizing/amplifying capacity.
We can also think of U and V as the sets of past nodes which have an outsizedly large and small influence on the future as a result of the actions of A, respectively.
Describing D
D={d∈E∪F∣∃v∈V∣Op(A; v, d)≫0}, in other words D is the region of E∪F in which A is removing the influence of V. We can even define a smaller set D′={d∈E∪F∣∀v∈V′ Op(A; v, d)≫0}, in other words the region in which A is totally removing the (local) influence of the total victim nodes V′. We can call D the domain of A.
Approaching A
One issue which we haven’t dealt with is how to actually pick the set A! We sort of arbitrarily declared it to be relevant at the start with no indication as to how we would do it.
There are lots of ways to pick out sets of nodes in a causal network. One involves thinking about minimum information-flow in a very John Wentworth-ish way. Another might be to just start with a very large choice of A and iteratively try moving nodes from A to E. If this shrinks D by a lot then this node might be important for A, otherwise it might not be. This might not always be true though!
Perhaps sensible choices of A are ones which make V and V′ more similar, and U and U′ more similar i.e. those for which nodes in P are cleanly split into utility-like and victim-like nodes.
Either way, it seems like we can probably find ways to find agents in a system.
Conjectures!
I conjecture that if A is a powerful optimizer, this will be expressed by having a large set D. It will also be expressed by having the values of Op(A;v,d) be large.
I conjecture that V, and especially V′ nodes get a pretty rough deal, and that it is bad to be a person living in V′.
I conjecture that the only nodes which get a good deal out of the actions of A are in U, and that for an AI to be aligned, U needs to contain the AI’s creator(s).
How does this help us think about optimizers?
The utility-function framework has not always been a great way to think about things. The differential optimization framework might be better, or failing that, different. The phrase “utility function” often implies a function which explicitly maps [world]→R and is explicitly represented in code. This framework defines U in a different way: as the set of regions of the past which have an oversized influence on the future via an optimizer A.
Thinking about the number of nodes in V and D (and how well the latter are optimized with respect to the former) also provides a link to Eliezer’s definition of optimization power found here. The more nodes in V, the more information we are at liberty to discard; the more nodes in D, the more of the world we can still predict; and the more heavily optimized the nodes of D, the smaller our loss of predictive power.