This is correct, but at least in the quote above, the most important distinction is that most RL algorithms propagate credit assignment back across steps but not across episodes.
This is correct, but at least in the quote above, the most important distinction is that most RL algorithms propagate credit assignment back across steps but not across episodes.