dxu comments on Seeking Power is Convergently Instrumental in a Broad Class of Environments

dxu 8 Aug 2021 22:18 UTC
LW: 64 AF: 36
1
AF
One particular example of this phenomenon that comes to mind:

In (traditional) chess-playing software, generally moves are selected using a combination of search and evaluation, where the search is (usually) some form of minimax with alpha-beta pruning, and the evaluation function is used to assign a value estimate to leaf nodes in the tree, which are then propagated to the root to select a move.

Typically, the evaluation function is designed by humans (although recent developments have changed that second part somewhat) to reflect meaningful features of chess understanding. Obviously, this is useful in order to provide a more accurate estimate for leaf node values, and hence more accurate move selection. What’s less obvious is what happens if you give a chess engine a random evaluation function, i.e. one that assigns an arbitrary value to any given position.

This has in fact been done, and the relevant part of the experiment is what happened when the resulting engine was made to play against itself at various search depths. Naively, you’d expect that since the evaluation function has no correlation with the actual value of the position, the engine would make more-or-less random moves regardless of its search depth—but in fact, this isn’t the case: even with a completely random evaluation function, higher search depths consistently beat lower search depths.

The reason for this, once the games were examined, is that the high-depth version of the engine seemed to consistently make moves that gave its pieces more mobility, i.e. gave it more legal moves in subsequent positions. This is because, given that the evaluation function assigns arbitrary values to leaf nodes, the “most extreme” value (which is what minimax cares about) will be uniformly distributed among the leaves, and hence branches with more leaves are more likely to be selected. And since mobility is in fact an important concept in real chess, this tendency manifests in a way that favors the higher-depth player.

At the time I first learned about this experiment, it struck me as merely a fascinating coincidence: the fact that selecting from a large number of nodes distributed non-uniformly leads to a natural approximation of the “mobility” concept was interesting, but nothing more. But the standpoint of instrumental convergence reveals another perspective: the concept chess players call “mobility” is actually important precisely because it emerges even in such a primitive system, one almost entirely divorced from the rules and goals of the game. In short, mobility is an instrumentally useful resource—not just in chess, but in all chess-like games, simply because it’s useful to have more legal moves no matter what your goal is.

(This is actually borne out in games of a chess variant called “suicide chess”, where the goal is to force your opponent to checkmate you. Despite having a completely opposite terminal goal to that of regular chess, games of suicide chess actually strongly resemble games of regular chess, at least during the first half. The reason for this is simply that in both games, the winning side needs to build up a dominant position before being able to force the opponent to do anything, whether that “anything” be delivering checkmate or being checkmated. Once that dominant position has been achieved, you can make use of it to attain whatever end state you want, but the process of reaching said dominant position is the same across variants.)

Additionally, there are some analogies to be drawn to the three types of utility function discussed in the post:
1. Depth-1 search is analogous to utility functions over action-observation histories (AOH), in that its move selection criterion depends only on the current position. With a random (arbitrary) evaluation function, move selection in this regime is equivalent to drawing randomly from a uniform distribution across the set of immediately available next moves, with no concern for what happens after that. There is no tendency for depth-1 players to seek mobility.
2. Depth-2 search is analogous to a utility function in a Markov decision process (MDP), in that its move selection criterion depends on the assessment of the immediate next time-step. With a random (arbitrary) evaluation function, move selection in this regime is equivalent to drawing randomly from a uniform distribution across the set of possible replies to one’s immediately available moves, which in turn is equivalent to drawing from a distribution over one’s next moves weighted by the number of replies. There is a mild tendency for depth-2 players to seek mobility.
3. Finally, maximum-depth search would be analogous to the classical utility function over observation histories (OH). With a random (arbitrary) evaluation function, move selection in this regime almost always picks whichever branch of the tree leads to the maximum possible number of terminal states. What this looks like in principle is unknown, but empirically we see that high-depth players have a strong tendency to seek mobility.
What links here?
- Daniel Kokotajlo 11 Aug 2021 9:36 UTC
  LW: 12 AF: 10
  1
  AF Parent
  This is one hell of a good comment! Strong-upvoted.
- Viktor Rehnberg 17 Feb 2023 9:06 UTC
  1 point
  Parent
  In case anyone else is looking for a source a good search term is probably the Beal Effect. From the original paper by Beal and Smith:
  
  Once the effect is pointed out, it does not take long to arrive at the conclusion that it arises from a natural correlation between a high branching factor in the game tree and having a winning move available. In other words, mobility (in the sense of having many moves available) is associated with better positions