It is impact that doesn’t change (and in fact is always zero), since it is defined in reference to your actual utility function.
You aren’t personally impacted in expectation (beyond R(s,a,s′)) if you’re optimal, rather. Even omniscient agents may have to deal with stochasticity in the dynamics.
ETA: your comment updated as i posted this. I think it’s still worth adding this point.
You aren’t personally impacted in expectation (beyond R(s,a,s′)) if you’re optimal, rather. Even omniscient agents may have to deal with stochasticity in the dynamics.
ETA: your comment updated as i posted this. I think it’s still worth adding this point.