I think epsilon-exploration is done for different reasons, but there are a bunch of cases in which “add some noise and then let the noise go to zero” is a viable strategy to solve problems. Here it’s done mainly to sidestep an issue of “dividing by zero”, which makes me think that there’s some kind of argument which sidesteps it by using limits or something like that. It feels similar to what happens when you try to divide by zero when differentiating a function.
The RL case is different and is more reminiscent of e.g. simulated annealing, where adding noise to an optimization procedure and letting the noise tend to zero over time improves performance compared to a more greedy approach. I don’t think these are quite the same thing as what’s happening with the EDT situation here, it seems to me like an application of the same technique for quite different purposes.
Here it’s done mainly to sidestep an issue of “dividing by zero”, which makes me think that there’s some kind of argument which sidesteps it by using limits or something like that.
I think epsilon-exploration is done for different reasons, but there are a bunch of cases in which “add some noise and then let the noise go to zero” is a viable strategy to solve problems. Here it’s done mainly to sidestep an issue of “dividing by zero”, which makes me think that there’s some kind of argument which sidesteps it by using limits or something like that. It feels similar to what happens when you try to divide by zero when differentiating a function.
The RL case is different and is more reminiscent of e.g. simulated annealing, where adding noise to an optimization procedure and letting the noise tend to zero over time improves performance compared to a more greedy approach. I don’t think these are quite the same thing as what’s happening with the EDT situation here, it seems to me like an application of the same technique for quite different purposes.
Here’s my attempt at sidestepping: EDT solves 5 and 10 with conditional oracles.