I like this post. Clear thesis, concrete example, and an argument that makes sense.
One consequence of your point is that in situations where RL training is metaphorically energy-constrained (with a negative reward that pushes you to go as fast as possible, or when there is a small space to go to where jittering might mean falling to one’s death and really bad reward), we should not see jitters. Is that coherent with the literature?
Thanks! That’s definitely a consequence of the argument.
It looks to me like that prediction is generally true, from what I remember about RL videos I’ve seen—i.e., the breakout paddle moves much more smoothly when the ball is near, DeepMind’s agents move more smoothly when being chased in tag, and so on. I should definitely made mental note to be alert to possible exceptions to this, though. I’m not aware of anywhere it’s been treated systematically.
I feel like I once saw RL agents trained with and without energy costs, where the agents trained with energy costs acted a lot less jittery. But I can’t remember where I saw it.
I like this post. Clear thesis, concrete example, and an argument that makes sense.
One consequence of your point is that in situations where RL training is metaphorically energy-constrained (with a negative reward that pushes you to go as fast as possible, or when there is a small space to go to where jittering might mean falling to one’s death and really bad reward), we should not see jitters. Is that coherent with the literature?
Thanks! That’s definitely a consequence of the argument.
It looks to me like that prediction is generally true, from what I remember about RL videos I’ve seen—i.e., the breakout paddle moves much more smoothly when the ball is near, DeepMind’s agents move more smoothly when being chased in tag, and so on. I should definitely made mental note to be alert to possible exceptions to this, though. I’m not aware of anywhere it’s been treated systematically.
I feel like I once saw RL agents trained with and without energy costs, where the agents trained with energy costs acted a lot less jittery. But I can’t remember where I saw it.