Vanessa Kosoy comments on Vanessa Kosoy’s Shortform

Vanessa Kosoy 6 Feb 2023 16:20 UTC
LW: 2 AF: 2
0
AF
For the contrived reward function you suggested, we would never have $C (π) ≫ C (U)$ . But for other reward functions, it is possible that $C (π) ≫ C (U)$ . Which is exactly why this framework rejects the contrived reward function in favor of those other reward functions. And also why this framework considers some policies unintelligent (despite the availability of the contrived reward function) and other policies intelligent.