Thanks! That’s definitely a consequence of the argument.
It looks to me like that prediction is generally true, from what I remember about RL videos I’ve seen—i.e., the breakout paddle moves much more smoothly when the ball is near, DeepMind’s agents move more smoothly when being chased in tag, and so on. I should definitely made mental note to be alert to possible exceptions to this, though. I’m not aware of anywhere it’s been treated systematically.
I feel like I once saw RL agents trained with and without energy costs, where the agents trained with energy costs acted a lot less jittery. But I can’t remember where I saw it.
Thanks! That’s definitely a consequence of the argument.
It looks to me like that prediction is generally true, from what I remember about RL videos I’ve seen—i.e., the breakout paddle moves much more smoothly when the ball is near, DeepMind’s agents move more smoothly when being chased in tag, and so on. I should definitely made mental note to be alert to possible exceptions to this, though. I’m not aware of anywhere it’s been treated systematically.
I feel like I once saw RL agents trained with and without energy costs, where the agents trained with energy costs acted a lot less jittery. But I can’t remember where I saw it.