TurnTrout comments on Short summary of mAIry’s room

TurnTrout 21 Jan 2021 2:32 UTC
LW: 5 AF: 4
AF
This isn’t key for your point, but:
In TD learning, if from some point the model always perfectly predicted the future
If it’s a perfect predictor of a deterministic world, sure. But if the world is stochastic, or you can’t assume realizability, your network can simultaneously be a global optimum but also have gradient updates. It’s just that in expectation, your gradient is zero, but if you update in sufficiently small batches, you might still have non-zero gradients.