IAFF-User-111 comments on Learning values versus indifference

IAFF-User-111 26 Aug 2016 20:51 UTC
0 points
AF
Why doesn’t normalizing rewards work?

(i.e. set max_pi(expected returns)=1 and min_pi(expected_returns)=0, for all environments)… I assume this is what you’re talking about at the end?