(I would describe this as ‘obviously correct’ and indeed almost ‘the entire point of RL’ in general: to maximize long-run reward, not myopically maximize next-step reward tantamount to the ‘episode’ ending there.)
(I would describe this as ‘obviously correct’ and indeed almost ‘the entire point of RL’ in general: to maximize long-run reward, not myopically maximize next-step reward tantamount to the ‘episode’ ending there.)