Why doesn’t normalizing rewards work?
(i.e. set max_pi(expected returns)=1 and min_pi(expected_returns)=0, for all environments)… I assume this is what you’re talking about at the end?
Why doesn’t normalizing rewards work?
(i.e. set max_pi(expected returns)=1 and min_pi(expected_returns)=0, for all environments)… I assume this is what you’re talking about at the end?