Thomas Kwa comments on Different perspectives on concept extrapolation

Thomas Kwa 8 Apr 2022 16:14 UTC
3 points
0
So let $D_{0}$ be the training data of videos of happy humans, $R_{1}$ the correct “make humans happy” reward function, and $R_{2}$ the degenerate reward function “make videos of happy humans”^[2].
We’d want the AI to deduce $R_{2}$ from $D_{0}$ .
should this say “deduce $R_{1}$ from $D_{0}$ ”?
- Stuart_Armstrong 8 Apr 2022 17:56 UTC
  2 points
  0
  Parent
  Indeed, thanks! Corrected.