Vika comments on Tradeoff between desirable properties for baseline choices in impact measures

Vika 12 Jul 2020 15:37 UTC
LW: 4 AF: 3
AF
Looks great, thanks! Minor point: in the sparse reward case, rather than “setting the baseline to the last state in which a reward was achieved”, we set the initial state of the inaction baseline to be this last rewarded state, and then apply noops from this initial state to obtain the baseline state (otherwise this would be a starting state baseline rather than an inaction baseline).
- Rohin Shah 12 Jul 2020 17:16 UTC
  LW: 4 AF: 3
  AF Parent
  Good point, changed
  by setting the baseline to the last state in which a reward was achieved.
  to
  by using the inaction baseline, and resetting its initial state to the current state whenever a reward is achieved.