Vanessa Kosoy comments on A central AI alignment problem: capabilities generalization, and the sharp left turn

Vanessa Kosoy 30 Jun 2022 7:05 UTC
LW: 8 AF: 4
4
AF
When I say “policy”, I mean the entire behavior including the learning algorithm, not some asymptotic behavior the system is converging to. Obviously, the policy is represented as genetic code, not as individual decisions. When I say “evolution is directly selecting the policy”, I mean that genotypes are selected based on their “expected reward” (reproductive fitness) rather than e.g. by evaluating the accuracy of the world-models those minds produce^[1]. And, genotypes are not a priori constrained to be learning algorithms with particular architectures, that’s something the outer loop has to learn.
1. ↩︎
  Evolution is not even model-free RL, since in MFRL we train a network to estimate the value function or the Q-function of different states, we don’t just GD on the expected reward. But, MFRL does have the problem of extrapolating the reward function incorrectly away from the training data.