In TD learning, if from some point the model always perfectly predicted the future
If it’s a perfect predictor of a deterministic world, sure. But if the world is stochastic, or you can’t assume realizability, your network can simultaneously be a global optimum but also have gradient updates. It’s just that in expectation, your gradient is zero, but if you update in sufficiently small batches, you might still have non-zero gradients.
This isn’t key for your point, but:
If it’s a perfect predictor of a deterministic world, sure. But if the world is stochastic, or you can’t assume realizability, your network can simultaneously be a global optimum but also have gradient updates. It’s just that in expectation, your gradient is zero, but if you update in sufficiently small batches, you might still have non-zero gradients.