Steven Byrnes comments on Reward is not the optimization target

Steven Byrnes 26 Jul 2022 17:13 UTC
LW: 5 AF: 3
0
AF
Sure, other things equal. But other things aren’t necessarily equal. For example, regularization could stack the deck in favor of one policy over another, even if the latter has been systematically producing slightly higher reward. There are lots of things like that; the details depend on the exact RL algorithm. In the context of brains, I have discussion and examples in §9.3.3 here.