Maybe I’m late to the party, in which case sorry about that & I look forward to hearing why I’m wrong, but I’m not convinced that epsilon-exploration is a satisfactory way to ensure that conditional probabilities are well-defined. Here’s why:
What ends up happening if I do action A often depends on why I did it. For example, if someone else is deciding how to treat me, and I defect against them, but it’s because of epsilon-exploration rather than because that’s what my reasoning process concluded, then they would likely be inclined to forgive me and cooperate with me in the future. So the conditional probability will be well-defined, but defined incorrectly—it will say that the probability of them cooperating with me in the future, conditional on me defecting now, is high.
I hear there is a way to fiddle with the foundations of probability theory so that conditional probabilities are taken as basic and ordinary probabilities are defined in terms of them. Maybe this would solve the problem?
I hear there is a way to fiddle with the foundations of probability theory so that conditional probabilities are taken as basic and ordinary probabilities are defined in terms of them. Maybe this would solve the problem?
This does help somewhat. See here. But, in order to get good answers from that, you need to already know enough about the structure of the situation.
Maybe I’m late to the party, in which case sorry about that & I look forward to hearing why I’m wrong, but I’m not convinced that epsilon-exploration is a satisfactory way to ensure that conditional probabilities are well-defined. Here’s why:
I agree, but I also think there are some things pointing in the direction of “there’s something interesting going on with epsilon exploration”. Specifically, there’s a pretty strong analogy between epsilon exploration and modal UDT: MUDT is like the limit as you send exploration probability to zero, so it never actually happens but it still happens in nonstandard models. However, that only seems to work when you know the structure of the situation logically. When you have to learn it, you have to actually explore sometimes to get it right.
To the extent that MUDT looks like a deep result about counterfactual reasoning, I take this as a point in favor of epsilon exploration telling us something about the deep structure of counterfactual reasoning.
Anyway, see here for some more recent thoughts of mine. (But I didn’t discuss the question of epsilon exploration as much as I could have.)
Maybe I’m late to the party, in which case sorry about that & I look forward to hearing why I’m wrong, but I’m not convinced that epsilon-exploration is a satisfactory way to ensure that conditional probabilities are well-defined. Here’s why:
What ends up happening if I do action A often depends on why I did it. For example, if someone else is deciding how to treat me, and I defect against them, but it’s because of epsilon-exploration rather than because that’s what my reasoning process concluded, then they would likely be inclined to forgive me and cooperate with me in the future. So the conditional probability will be well-defined, but defined incorrectly—it will say that the probability of them cooperating with me in the future, conditional on me defecting now, is high.
I hear there is a way to fiddle with the foundations of probability theory so that conditional probabilities are taken as basic and ordinary probabilities are defined in terms of them. Maybe this would solve the problem?
This does help somewhat. See here. But, in order to get good answers from that, you need to already know enough about the structure of the situation.
I agree, but I also think there are some things pointing in the direction of “there’s something interesting going on with epsilon exploration”. Specifically, there’s a pretty strong analogy between epsilon exploration and modal UDT: MUDT is like the limit as you send exploration probability to zero, so it never actually happens but it still happens in nonstandard models. However, that only seems to work when you know the structure of the situation logically. When you have to learn it, you have to actually explore sometimes to get it right.
To the extent that MUDT looks like a deep result about counterfactual reasoning, I take this as a point in favor of epsilon exploration telling us something about the deep structure of counterfactual reasoning.
Anyway, see here for some more recent thoughts of mine. (But I didn’t discuss the question of epsilon exploration as much as I could have.)