Rohin Shah comments on Tradeoff between desirable properties for baseline choices in impact measures

Rohin Shah 12 Jul 2020 17:16 UTC
LW: 4 AF: 3
AF
Good point, changed
by setting the baseline to the last state in which a reward was achieved.
to
by using the inaction baseline, and resetting its initial state to the current state whenever a reward is achieved.