Kaj_Sotala comments on The Catastrophic Convergence Conjecture

Kaj_Sotala 17 Feb 2020 13:42 UTC
LW: 7 AF: 3
AF
Human values are complicated and fragile
It’s not clear to me whether you actually meant to suggest this as well, but this line of reasoning makes me wonder if many of our values are actually not that complicated and fragile after all, instead being to connected to AU considerations. E.g. self-determination theory’s basic needs of autonomy, competence and relatedness seem like different ways of increasing your AU, and the boredom example might not feel catastrophic because of some highly arbitrary “avoid boredom” bit in the utility function, but rather because looping a single experience over and over isn’t going to help you maintain your ability to avoid catastrophes. (That is, our motivations and values optimize for maintaining AU among other things, even if that is not the thing that those values feel like from the inside.)
- TurnTrout 17 Feb 2020 17:25 UTC
  LW: 4 AF: 2
  AF Parent
  Intriguing. I don’t know whether that suggests our values aren’t as complicated as we thought, or whether the pressures which selected them are not complicated.
  
  While I’m not an expert on the biological intrinsic motivation literature, I think it’s at least true that some parts of our values were selected for because they’re good heuristics for maintaining AU. This is the thing that MCE was trying to explain:
  
  The paper’s central notion begins with the claim is that there is a physical principle, called “causal entropic forces,” that drives a physical system toward a state that maximizes its options for future change. For example, a particle inside a rectangular box will move to the center rather than to the side, because once it is at the center it has the option of moving in any direction. Moreover, argues the paper, physical systems governed by causal entropic forces exhibit intelligent behavior.
  
  I think they have this backwards: intelligent behavior often results in instrumentally convergent behavior (and not necessarily the other way around). Similarly, Salge et al. overview the behavioral empowerment hypothesis:
  
  The adaptation brought about by natural evolution reduce organisms that in absence of specific goals behave as if they were maximizing [mutual information between their actions and future observations].
  
  As I discuss in section 6.1 of Optimal Farsighted Agents Tend to Seek Power, I think that “ability to achieve goals in general” (power) is a better intuitive and technical notion than information-theoretic empowerment. I think it’s pretty plausible that we have heuristics which, all else equal, push us to maintain or increase our power.