Ege Erdil comments on What is causality to an evidential decision theorist?

Ege Erdil 18 Apr 2022 22:05 UTC
1 point
0
I think epsilon-exploration is done for different reasons, but there are a bunch of cases in which “add some noise and then let the noise go to zero” is a viable strategy to solve problems. Here it’s done mainly to sidestep an issue of “dividing by zero”, which makes me think that there’s some kind of argument which sidesteps it by using limits or something like that. It feels similar to what happens when you try to divide by zero when differentiating a function.

The RL case is different and is more reminiscent of e.g. simulated annealing, where adding noise to an optimization procedure and letting the noise tend to zero over time improves performance compared to a more greedy approach. I don’t think these are quite the same thing as what’s happening with the EDT situation here, it seems to me like an application of the same technique for quite different purposes.
- jessicata 19 Apr 2022 0:20 UTC
  7 points
  0
  Parent
  
  Here it’s done mainly to sidestep an issue of “dividing by zero”, which makes me think that there’s some kind of argument which sidesteps it by using limits or something like that.
  
  Here’s my attempt at sidestepping: EDT solves 5 and 10 with conditional oracles.