Vanessa Kosoy comments on Vanessa Kosoy’s Shortform

Vanessa Kosoy 2 Feb 2023 14:14 UTC
LW: 3 AF: 3
0
AF
My framework discards such contrived reward functions because it penalizes for the complexity of the reward function. In the construction you describe, we have $C (U) \approx C (π)$ . This corresponds to $g \approx 0$ (no/low intelligence). On the other hand, policies with $g ≫ 0$ (high intelligence) have the property that $C (π) ≫ C (U)$ for the $U$ which “justifies” this $g$ . In other words, your “minimal” overhead is very large from my point of view: to be acceptable, the “overhead” should be substantially negative.
- David Scott Krueger (formerly: capybaralet) 5 Feb 2023 12:01 UTC
  LW: 2 AF: 1
  AF Parent
  I think the construction gives us $C(\pi) \leq C(U) + e$ for a small constant $e$ (representing the wrapper). It seems like any compression you can apply to the reward function can be translated to the policy via the wrapper. So then you would never have $C(\pi) >> C(U)$. What am I missing/misunderstanding?
  - Vanessa Kosoy 6 Feb 2023 16:20 UTC
    LW: 2 AF: 2
    AF Parent
    For the contrived reward function you suggested, we would never have $C (π) ≫ C (U)$ . But for other reward functions, it is possible that $C (π) ≫ C (U)$ . Which is exactly why this framework rejects the contrived reward function in favor of those other reward functions. And also why this framework considers some policies unintelligent (despite the availability of the contrived reward function) and other policies intelligent.