I hear there is a way to fiddle with the foundations of probability theory so that conditional probabilities are taken as basic and ordinary probabilities are defined in terms of them. Maybe this would solve the problem?
This does help somewhat. See here. But, in order to get good answers from that, you need to already know enough about the structure of the situation.
Maybe I’m late to the party, in which case sorry about that & I look forward to hearing why I’m wrong, but I’m not convinced that epsilon-exploration is a satisfactory way to ensure that conditional probabilities are well-defined. Here’s why:
I agree, but I also think there are some things pointing in the direction of “there’s something interesting going on with epsilon exploration”. Specifically, there’s a pretty strong analogy between epsilon exploration and modal UDT: MUDT is like the limit as you send exploration probability to zero, so it never actually happens but it still happens in nonstandard models. However, that only seems to work when you know the structure of the situation logically. When you have to learn it, you have to actually explore sometimes to get it right.
To the extent that MUDT looks like a deep result about counterfactual reasoning, I take this as a point in favor of epsilon exploration telling us something about the deep structure of counterfactual reasoning.
Anyway, see here for some more recent thoughts of mine. (But I didn’t discuss the question of epsilon exploration as much as I could have.)
This does help somewhat. See here. But, in order to get good answers from that, you need to already know enough about the structure of the situation.
I agree, but I also think there are some things pointing in the direction of “there’s something interesting going on with epsilon exploration”. Specifically, there’s a pretty strong analogy between epsilon exploration and modal UDT: MUDT is like the limit as you send exploration probability to zero, so it never actually happens but it still happens in nonstandard models. However, that only seems to work when you know the structure of the situation logically. When you have to learn it, you have to actually explore sometimes to get it right.
To the extent that MUDT looks like a deep result about counterfactual reasoning, I take this as a point in favor of epsilon exploration telling us something about the deep structure of counterfactual reasoning.
Anyway, see here for some more recent thoughts of mine. (But I didn’t discuss the question of epsilon exploration as much as I could have.)