Differential Optimization Reframes and Generalizes Utility-Maximization
Consider the following world: a set of past nodes, a set of environment nodes a set of actor/agent nodes , and a set of future nodes . Consider our previously defined function for , which represents how much is optimizing the value of with respect to . For some choices of and , we might find that the values allow us to split the members of into two categories: those for which , those with , and those with .
Picking P Apart
We can define . In other words, is the subset of which has no local influence on the future node whatsoever. If you want to be cute you can think of as the victim of with respect to , because it is causally erased from existence.
Remember that is high when “freezing” the information going out of means that the value of depends much more on the value . What does it mean when “freezing” the information of does the opposite?
Now lets define . This means that when we freeze , these nodes have much less influence on the future. Their influence flows through . We can call the utility-function-like region of with respect to . We might also think of the actions of as amplifying the influence of on .
For completeness we’ll define .
Let’s now define i.e. the set of points which are utility-like for any future point. will be the set of points which are victim-like for any future points. We might want to define as the set of “total” victims and similarly. meaning only contains nodes which really don’t interact with at all in an optimizing/amplifying capacity.
We can also think of and as the sets of past nodes which have an outsizedly large and small influence on the future as a result of the actions of , respectively.
Describing D
, in other words is the region of in which is removing the influence of . We can even define a smaller set , in other words the region in which is totally removing the (local) influence of the total victim nodes . We can call the domain of .
Approaching A
One issue which we haven’t dealt with is how to actually pick the set ! We sort of arbitrarily declared it to be relevant at the start with no indication as to how we would do it.
There are lots of ways to pick out sets of nodes in a causal network. One involves thinking about minimum information-flow in a very John Wentworth-ish way. Another might be to just start with a very large choice of and iteratively try moving nodes from to . If this shrinks by a lot then this node might be important for , otherwise it might not be. This might not always be true though!
Perhaps sensible choices of are ones which make and more similar, and and more similar i.e. those for which nodes in are cleanly split into utility-like and victim-like nodes.
Either way, it seems like we can probably find ways to find agents in a system.
Conjectures!
I conjecture that if is a powerful optimizer, this will be expressed by having a large set . It will also be expressed by having the values of be large.
I conjecture that , and especially nodes get a pretty rough deal, and that it is bad to be a person living in .
I conjecture that the only nodes which get a good deal out of the actions of are in , and that for an AI to be aligned, needs to contain the AI’s creator(s).
How does this help us think about optimizers?
The utility-function framework has not always been a great way to think about things. The differential optimization framework might be better, or failing that, different. The phrase “utility function” often implies a function which explicitly maps and is explicitly represented in code. This framework defines in a different way: as the set of regions of the past which have an oversized influence on the future via an optimizer .
Thinking about the number of nodes in and (and how well the latter are optimized with respect to the former) also provides a link to Eliezer’s definition of optimization power found here. The more nodes in , the more information we are at liberty to discard; the more nodes in , the more of the world we can still predict; and the more heavily optimized the nodes of , the smaller our loss of predictive power.
Neat. Why is it worth calling U “utility,” or even utility-like, though? If I tell you the set of things that I observe that significantly change my behavior, this tells you a lot about me but it doesn’t tell you which function of these observations I’m using to make decisions.
E.g. both teams in a soccer game will respond to the position of the ball (the ball is in U—or some relaxed notion of it, since I think your full notion might be too strong), but want to do different things with it.
I think the position of the ball is in V, since the players are responding to the position of the ball by forcing it towards the goal. It’s difficult to predict the long-term position of the ball based on where it is now. The position of the opponent’s goal would be an example of something in U for both teams. In this case both team’s utility-functions contain a robust pointer to the goal’s position.