TurnTrout comments on The Catastrophic Convergence Conjecture

TurnTrout 17 Feb 2020 17:25 UTC
LW: 4 AF: 2
AF
Intriguing. I don’t know whether that suggests our values aren’t as complicated as we thought, or whether the pressures which selected them are not complicated.

While I’m not an expert on the biological intrinsic motivation literature, I think it’s at least true that some parts of our values were selected for because they’re good heuristics for maintaining AU. This is the thing that MCE was trying to explain:

The paper’s central notion begins with the claim is that there is a physical principle, called “causal entropic forces,” that drives a physical system toward a state that maximizes its options for future change. For example, a particle inside a rectangular box will move to the center rather than to the side, because once it is at the center it has the option of moving in any direction. Moreover, argues the paper, physical systems governed by causal entropic forces exhibit intelligent behavior.

I think they have this backwards: intelligent behavior often results in instrumentally convergent behavior (and not necessarily the other way around). Similarly, Salge et al. overview the behavioral empowerment hypothesis:

The adaptation brought about by natural evolution reduce organisms that in absence of specific goals behave as if they were maximizing [mutual information between their actions and future observations].

As I discuss in section 6.1 of Optimal Farsighted Agents Tend to Seek Power, I think that “ability to achieve goals in general” (power) is a better intuitive and technical notion than information-theoretic empowerment. I think it’s pretty plausible that we have heuristics which, all else equal, push us to maintain or increase our power.