Right. Imagine an agent picking actions in a discrete-time game. Each time-advancing decision is a step. (E.g. for a debate, submitting one argument is a step.) But you don’t just leave it running forever, (typically) you occasionally reset the environment to a (potentially random) starting state and let the agent try again—an episode.
This is correct, but at least in the quote above, the most important distinction is that most RL algorithms propagate credit assignment back across steps but not across episodes.
Sometimes ‘step’ refers to such atomic environmental interactions. Then this is right.
Other times, ‘step’ (especially ‘training step’ or ‘gradient step’ but not always qualified) refers to a step in a training algorithm. For example a classic pattern in RL is collect many episodes or sub-episode trajectory fragments, and use them to compute a gradient update. That’s also called a ‘step’. Outside of RL, this is probably the only (or at least main) use of the word ‘step’.
Right. Imagine an agent picking actions in a discrete-time game. Each time-advancing decision is a step. (E.g. for a debate, submitting one argument is a step.) But you don’t just leave it running forever, (typically) you occasionally reset the environment to a (potentially random) starting state and let the agent try again—an episode.
This is correct, but at least in the quote above, the most important distinction is that most RL algorithms propagate credit assignment back across steps but not across episodes.
I agree, except I want to add a caveat.
Sometimes ‘step’ refers to such atomic environmental interactions. Then this is right.
Other times, ‘step’ (especially ‘training step’ or ‘gradient step’ but not always qualified) refers to a step in a training algorithm. For example a classic pattern in RL is collect many episodes or sub-episode trajectory fragments, and use them to compute a gradient update. That’s also called a ‘step’. Outside of RL, this is probably the only (or at least main) use of the word ‘step’.
Thanks Charlie, Evan H. and Oliver. Your comments definitely help to give me a clearer picture.