So let D0 be the training data of videos of happy humans, R1 the correct “make humans happy” reward function, and R2 the degenerate reward function “make videos of happy humans”[2].We’d want the AI to deduce R2 from D0.
So let D0 be the training data of videos of happy humans, R1 the correct “make humans happy” reward function, and R2 the degenerate reward function “make videos of happy humans”[2].
We’d want the AI to deduce R2 from D0.
should this say “deduce R1 from D0”?
Indeed, thanks! Corrected.
should this say “deduce R1 from D0”?
Indeed, thanks! Corrected.