You say two things that seem in conflict with one another.
[Excerpt 1] If a system is well-described by a causal diagram, then it satisfies a complex set of statistical relationships. For example … To an evidential decision theorist, these kinds of statistical relationships are the whole story about causality, or at least about its relevance to decisions. [Excerpt 2] [Suppose] that there is a complicated causal diagram containing X and Y, such that my beliefs satisfy all of the statistical relationships implied by that causal diagram. EDT recommends maximizing the conditional expectation of Y, conditioned on all the inputs to X. [emphasis added]
In [1], you say that the EDT agent only cares about the statistical relationships between variables, i.e. P(V) over the set of variables V in a Bayes net—a BN that apparently need not even be causal—nothing more.
In [2], you say that the EDT agent needs to know the parents of X. This indicates that the agent needs to know something that is not entailed by P(V), and something that is apparently causal.
Maybe you want the agent to know some causal relationships, i.e. the relationships with decision-parents, but not others?
Under these conditions, it’s easy to see that intervening on X is the same as conditioning on X.
This is true for decisions that are in the support, given the assignment to the parents, but not otherwise. CDT can form an opinion about actions that “never happen”, whereas EDT cannot.
In [2], you say that the EDT agent needs to know the parents of X. This indicates that the agent needs to know something that is not entailed by P(V).
“The parents of X” is stuff like the observations that agent has made, as well as the policy the agent uses. It is bog-standard for EDT to use this in its decisions, and because of the special nature of those variables, it does not require knowing an entire causal model.
The EDT agent needs to know the inputs to its own decision process to even make decisions at all, so I don’t think there’s a causal implication there. Obviously no decision theory can get off the ground if it’s not permitted to have any inputs. It’s just that in a causal model the inputs to a decision process would have to be causal arrows going from the inputs to the decision-maker.
If by “coinciding for decisions that are in the support” you mean what I think that means, then that’s true re: actions that never happen, but it’s not clear why actions that never happen should influence your assessment of how a decision theory works. Implicitly when you do anything probabilistic you assume that sets of null measure can be thrown away without changing anything.
If by “coinciding for decisions that are in the support” you mean what I think that means, then that’s true re: actions that never happen, but it’s not clear why actions that never happen should influence your assessment of how a decision theory works. Implicitly when you do anything probabilistic you assume that sets of null measure can be thrown away without changing anything.
Issue is you need to actually condition on the actions that never happen to decide what their expected utility would be, which is necessary to decide not to take them.
I don’t think this is a real world problem, because you can just do some kind of relaxation by adding random noise to your actions and then let the standard deviation go to zero. In practice there aren’t perfectly deterministic systems anyway.
It’s likely that some strategy like that also works in theory & has already been worked out by someone, but in any event it doesn’t seem like a serious obstacle unless the “renormalization” ends up being dependent on which procedure you pick, which seems unlikely.
I think epsilon-exploration is done for different reasons, but there are a bunch of cases in which “add some noise and then let the noise go to zero” is a viable strategy to solve problems. Here it’s done mainly to sidestep an issue of “dividing by zero”, which makes me think that there’s some kind of argument which sidesteps it by using limits or something like that. It feels similar to what happens when you try to divide by zero when differentiating a function.
The RL case is different and is more reminiscent of e.g. simulated annealing, where adding noise to an optimization procedure and letting the noise tend to zero over time improves performance compared to a more greedy approach. I don’t think these are quite the same thing as what’s happening with the EDT situation here, it seems to me like an application of the same technique for quite different purposes.
Here it’s done mainly to sidestep an issue of “dividing by zero”, which makes me think that there’s some kind of argument which sidesteps it by using limits or something like that.
You say two things that seem in conflict with one another.
In [1], you say that the EDT agent only cares about the statistical relationships between variables, i.e. P(V) over the set of variables V in a Bayes net—a BN that apparently need not even be causal—nothing more.
In [2], you say that the EDT agent needs to know the parents of X. This indicates that the agent needs to know something that is not entailed by P(V), and something that is apparently causal.
Maybe you want the agent to know some causal relationships, i.e. the relationships with decision-parents, but not others?
This is true for decisions that are in the support, given the assignment to the parents, but not otherwise. CDT can form an opinion about actions that “never happen”, whereas EDT cannot.
“The parents of X” is stuff like the observations that agent has made, as well as the policy the agent uses. It is bog-standard for EDT to use this in its decisions, and because of the special nature of those variables, it does not require knowing an entire causal model.
The EDT agent needs to know the inputs to its own decision process to even make decisions at all, so I don’t think there’s a causal implication there. Obviously no decision theory can get off the ground if it’s not permitted to have any inputs. It’s just that in a causal model the inputs to a decision process would have to be causal arrows going from the inputs to the decision-maker.
If by “coinciding for decisions that are in the support” you mean what I think that means, then that’s true re: actions that never happen, but it’s not clear why actions that never happen should influence your assessment of how a decision theory works. Implicitly when you do anything probabilistic you assume that sets of null measure can be thrown away without changing anything.
Issue is you need to actually condition on the actions that never happen to decide what their expected utility would be, which is necessary to decide not to take them.
I don’t think this is a real world problem, because you can just do some kind of relaxation by adding random noise to your actions and then let the standard deviation go to zero. In practice there aren’t perfectly deterministic systems anyway.
It’s likely that some strategy like that also works in theory & has already been worked out by someone, but in any event it doesn’t seem like a serious obstacle unless the “renormalization” ends up being dependent on which procedure you pick, which seems unlikely.
This is called epsilon-exploration in RL.
I think epsilon-exploration is done for different reasons, but there are a bunch of cases in which “add some noise and then let the noise go to zero” is a viable strategy to solve problems. Here it’s done mainly to sidestep an issue of “dividing by zero”, which makes me think that there’s some kind of argument which sidesteps it by using limits or something like that. It feels similar to what happens when you try to divide by zero when differentiating a function.
The RL case is different and is more reminiscent of e.g. simulated annealing, where adding noise to an optimization procedure and letting the noise tend to zero over time improves performance compared to a more greedy approach. I don’t think these are quite the same thing as what’s happening with the EDT situation here, it seems to me like an application of the same technique for quite different purposes.
Here’s my attempt at sidestepping: EDT solves 5 and 10 with conditional oracles.