First of all, it seems to me that “updateless CDT” and “updateless EDT” are the same for agents with access to their own internal states immediately prior to the decision theory computation: on an appropriate causal graph such internal states would be the only nodes with arrows leading to the nodes “output of decision theory”, so if their value is known, then severing those arrows does not affect the computation for updating on an observation of the value of the “output of decision theory” node. So the counterfactual and conditional probability distributions are the same, and thus CDT and EDT are the same.
Anyway, here is another example where CDT is superior to EDT, without obviously bad agent design. Suppose an agent needs to decide whether to undertake a mission that may either succeed or fail, with utilities:
Not trying: 0
Trying and failing: −1000
Trying and succeeding: 100
The agent initially believes that its probability of success is 50%, but it can perform an expensive computation (-10 utility) to update this probability to either 1% or 99%. In any decision theory, if the new probability is 1% then it will not try, and if the new probability is 99% then it will try. However, it also has to make the decision whether to perform the computation in the first place, and if not, whether to try anyway or not. Under CDT, the choice is easy: computation gives an expected utility of
0.5(0) + 0.5(0.99(100) − 0.01(1000)) − 10 = 34.5
trying without computation gives an expected utility of
0.5(100) − 0.5(1000) = −450
and not trying without computation gives a utility of 0, so the agent performs the computation.
Under EDT, the equilibrium solution cannot be to always perform the computation. Indeed, if it were so, then the expected utility for trying without computation would be
0.99(100) − 0.01(1000) = 89
while the other two expected utilities would be the same, so the agent would try without computation. (If the agent observes itself trying, it infers that it must have done so because it computed the probability as 99%, and thus the probability of success must be 99%.)
The actual equilibrium would likely be to usually run the computation, but sometimes try without computation. But it is irrelevant, the point is that EDT recommends something different than the correct CDT solution.
Note that the computation cost is in fact irrelevant to the example, I only introduced it to motivate that a decision needs to be made about whether to make the computation or not. In fact, EDT would recommend a non-optimal solution even if there were no cost associated to running the computation.
As before, the key feature of this example is that while computing expected utilities, the agent does not have access to information about its internal states prior to choosing its action: it must choose to either update its priors about them using information from its counterfactual action (EDT) or sever this causal connection before updating (CDT).
As far as I can tell, there is no analogous setup where EDT is preferable to CDT.
Note: The reason the idea of this example can’t be used in the setup of the OP is that in the OP, the inaccessible variable (the utility function) is actually used in the decision theory computation, whereas in my example the inaccessible variable (the probability of success) is inaccessible because it is hard to compute, so it wouldn’t make sense to use it in the computation.
First of all, it seems to me that “updateless CDT” and “updateless EDT” are the same for agents with access to their own internal states immediately prior to the decision theory computation: on an appropriate causal graph such internal states would be the only nodes with arrows leading to the nodes “output of decision theory”, so if their value is known, then severing those arrows does not affect the computation for updating on an observation of the value of the “output of decision theory” node. So the counterfactual and conditional probability distributions are the same, and thus CDT and EDT are the same.
I don’t think “any appropriate causal graph” necessarily has the structure you suggest. (We don’t have a good idea for what causal graphs on logical uncertainty look like.) It’s plausible that your assertion is true, but not obvious.
(If the agent observes itself trying, it infers that it must have done so because it computed the probability as 99%, and thus the probability of success must be 99%.)
EDT isn’t nearly this bad. I think a lot of people have this idea that EDT goes around wagging tails of dogs to try to make the dogs happy. But, EDT doesn’t condition on the dog’s tail wagging: it conditions on personally wagging the dog’s tail, which has no a priori reason to be correlated with the dog’s happiness.
Similarly, EDT doesn’t just condition on “trying”: it conditions on everything it knows, including that it hasn’t yet performed the computation. The only equilibrium solution will be for the AI to run the computation every time except on exploration rounds. It sees that it does quite poorly on the exploration rounds where it tries without running the computation, so it never chooses to do that.
My intuition is that if you are trying to draw causal graphs that do something other than draw arrows from x to f(x) (where f is something like a decision theory), then you are doing something wrong. But I guess I could be wrong about this. However, the point stands that it won’t be possible to distinguish CDT and EDT without either finding a plausible causal graph which doesn’t have the property I want, or talking about an agent that doesn’t have access to its own internal states immediately prior to the decision theory computation (and it seems reasonable to say that such an agent is badly designed).
After thinking more specifically about my example, I think it is based on mistakenly conflating two conceptual divisions of the process in question into action vs computation. I think it is still good as an example of approximately how badly designed an agent needs to be before CDT and EDT become distinct.
First of all, it seems to me that “updateless CDT” and “updateless EDT” are the same for agents with access to their own internal states immediately prior to the decision theory computation: on an appropriate causal graph such internal states would be the only nodes with arrows leading to the nodes “output of decision theory”, so if their value is known, then severing those arrows does not affect the computation for updating on an observation of the value of the “output of decision theory” node. So the counterfactual and conditional probability distributions are the same, and thus CDT and EDT are the same.
Anyway, here is another example where CDT is superior to EDT, without obviously bad agent design. Suppose an agent needs to decide whether to undertake a mission that may either succeed or fail, with utilities:
Not trying: 0
Trying and failing: −1000
Trying and succeeding: 100
The agent initially believes that its probability of success is 50%, but it can perform an expensive computation (-10 utility) to update this probability to either 1% or 99%. In any decision theory, if the new probability is 1% then it will not try, and if the new probability is 99% then it will try. However, it also has to make the decision whether to perform the computation in the first place, and if not, whether to try anyway or not. Under CDT, the choice is easy: computation gives an expected utility of
0.5(0) + 0.5(0.99(100) − 0.01(1000)) − 10 = 34.5
trying without computation gives an expected utility of
0.5(100) − 0.5(1000) = −450
and not trying without computation gives a utility of 0, so the agent performs the computation.
Under EDT, the equilibrium solution cannot be to always perform the computation. Indeed, if it were so, then the expected utility for trying without computation would be
0.99(100) − 0.01(1000) = 89
while the other two expected utilities would be the same, so the agent would try without computation. (If the agent observes itself trying, it infers that it must have done so because it computed the probability as 99%, and thus the probability of success must be 99%.)
The actual equilibrium would likely be to usually run the computation, but sometimes try without computation. But it is irrelevant, the point is that EDT recommends something different than the correct CDT solution.
Note that the computation cost is in fact irrelevant to the example, I only introduced it to motivate that a decision needs to be made about whether to make the computation or not. In fact, EDT would recommend a non-optimal solution even if there were no cost associated to running the computation.
As before, the key feature of this example is that while computing expected utilities, the agent does not have access to information about its internal states prior to choosing its action: it must choose to either update its priors about them using information from its counterfactual action (EDT) or sever this causal connection before updating (CDT).
As far as I can tell, there is no analogous setup where EDT is preferable to CDT.
Note: The reason the idea of this example can’t be used in the setup of the OP is that in the OP, the inaccessible variable (the utility function) is actually used in the decision theory computation, whereas in my example the inaccessible variable (the probability of success) is inaccessible because it is hard to compute, so it wouldn’t make sense to use it in the computation.
I don’t think “any appropriate causal graph” necessarily has the structure you suggest. (We don’t have a good idea for what causal graphs on logical uncertainty look like.) It’s plausible that your assertion is true, but not obvious.
EDT isn’t nearly this bad. I think a lot of people have this idea that EDT goes around wagging tails of dogs to try to make the dogs happy. But, EDT doesn’t condition on the dog’s tail wagging: it conditions on personally wagging the dog’s tail, which has no a priori reason to be correlated with the dog’s happiness.
Similarly, EDT doesn’t just condition on “trying”: it conditions on everything it knows, including that it hasn’t yet performed the computation. The only equilibrium solution will be for the AI to run the computation every time except on exploration rounds. It sees that it does quite poorly on the exploration rounds where it tries without running the computation, so it never chooses to do that.
My intuition is that if you are trying to draw causal graphs that do something other than draw arrows from x to f(x) (where f is something like a decision theory), then you are doing something wrong. But I guess I could be wrong about this. However, the point stands that it won’t be possible to distinguish CDT and EDT without either finding a plausible causal graph which doesn’t have the property I want, or talking about an agent that doesn’t have access to its own internal states immediately prior to the decision theory computation (and it seems reasonable to say that such an agent is badly designed).
After thinking more specifically about my example, I think it is based on mistakenly conflating two conceptual divisions of the process in question into action vs computation. I think it is still good as an example of approximately how badly designed an agent needs to be before CDT and EDT become distinct.