Vivek Hebbar comments on Jitters No Evidence of Stupidity in RL

Vivek Hebbar 18 Sep 2021 2:19 UTC
3 points
The random jittering reminds me of the random movements of the stock market: As new information trickles in, the estimate of the optimal point jitters around noisily, rather than following a smooth trajectory. If the value being estimated is Utility(action A) - Utility(action B), then we would expect the agent to jitter between the two actions when the estimate is near zero, like some sort of random walk repeatedly crossing the axis.