TurnTrout comments on TurnTrout’s shortform feed

TurnTrout 21 Oct 2020 2:37 UTC
LW: 2 AF: 1
0
AF
From unpublished work.
The answer to this seems obvious in isolation: shaping helps with credit assignment, rescaling doesn’t (and might complicate certain methods in the advantage vs Q-value way). But I feel like maybe there’s an important interaction here that could inform a mathematical theory of how a reward signal guides learners through model space?