Ege Erdil comments on What is causality to an evidential decision theorist?

Ege Erdil 18 Apr 2022 21:45 UTC
1 point
I don’t think this is a real world problem, because you can just do some kind of relaxation by adding random noise to your actions and then let the standard deviation go to zero. In practice there aren’t perfectly deterministic systems anyway.

It’s likely that some strategy like that also works in theory & has already been worked out by someone, but in any event it doesn’t seem like a serious obstacle unless the “renormalization” ends up being dependent on which procedure you pick, which seems unlikely.
- tailcalled 18 Apr 2022 21:57 UTC
  3 points
  Parent
  This is called epsilon-exploration in RL.
  - Ege Erdil 18 Apr 2022 22:05 UTC
    1 point
    Parent
    I think epsilon-exploration is done for different reasons, but there are a bunch of cases in which “add some noise and then let the noise go to zero” is a viable strategy to solve problems. Here it’s done mainly to sidestep an issue of “dividing by zero”, which makes me think that there’s some kind of argument which sidesteps it by using limits or something like that. It feels similar to what happens when you try to divide by zero when differentiating a function.
    
    The RL case is different and is more reminiscent of e.g. simulated annealing, where adding noise to an optimization procedure and letting the noise tend to zero over time improves performance compared to a more greedy approach. I don’t think these are quite the same thing as what’s happening with the EDT situation here, it seems to me like an application of the same technique for quite different purposes.
    - jessicata 19 Apr 2022 0:20 UTC
      7 points
      Parent
      
      Here it’s done mainly to sidestep an issue of “dividing by zero”, which makes me think that there’s some kind of argument which sidesteps it by using limits or something like that.
      
      Here’s my attempt at sidestepping: EDT solves 5 and 10 with conditional oracles.