First of all, it seems to me that “updateless CDT” and “updateless EDT” are the same for agents with access to their own internal states immediately prior to the decision theory computation: on an appropriate causal graph such internal states would be the only nodes with arrows leading to the nodes “output of decision theory”, so if their value is known, then severing those arrows does not affect the computation for updating on an observation of the value of the “output of decision theory” node. So the counterfactual and conditional probability distributions are the same, and thus CDT and EDT are the same.
I don’t think “any appropriate causal graph” necessarily has the structure you suggest. (We don’t have a good idea for what causal graphs on logical uncertainty look like.) It’s plausible that your assertion is true, but not obvious.
(If the agent observes itself trying, it infers that it must have done so because it computed the probability as 99%, and thus the probability of success must be 99%.)
EDT isn’t nearly this bad. I think a lot of people have this idea that EDT goes around wagging tails of dogs to try to make the dogs happy. But, EDT doesn’t condition on the dog’s tail wagging: it conditions on personally wagging the dog’s tail, which has no a priori reason to be correlated with the dog’s happiness.
Similarly, EDT doesn’t just condition on “trying”: it conditions on everything it knows, including that it hasn’t yet performed the computation. The only equilibrium solution will be for the AI to run the computation every time except on exploration rounds. It sees that it does quite poorly on the exploration rounds where it tries without running the computation, so it never chooses to do that.
My intuition is that if you are trying to draw causal graphs that do something other than draw arrows from x to f(x) (where f is something like a decision theory), then you are doing something wrong. But I guess I could be wrong about this. However, the point stands that it won’t be possible to distinguish CDT and EDT without either finding a plausible causal graph which doesn’t have the property I want, or talking about an agent that doesn’t have access to its own internal states immediately prior to the decision theory computation (and it seems reasonable to say that such an agent is badly designed).
After thinking more specifically about my example, I think it is based on mistakenly conflating two conceptual divisions of the process in question into action vs computation. I think it is still good as an example of approximately how badly designed an agent needs to be before CDT and EDT become distinct.
I don’t think “any appropriate causal graph” necessarily has the structure you suggest. (We don’t have a good idea for what causal graphs on logical uncertainty look like.) It’s plausible that your assertion is true, but not obvious.
EDT isn’t nearly this bad. I think a lot of people have this idea that EDT goes around wagging tails of dogs to try to make the dogs happy. But, EDT doesn’t condition on the dog’s tail wagging: it conditions on personally wagging the dog’s tail, which has no a priori reason to be correlated with the dog’s happiness.
Similarly, EDT doesn’t just condition on “trying”: it conditions on everything it knows, including that it hasn’t yet performed the computation. The only equilibrium solution will be for the AI to run the computation every time except on exploration rounds. It sees that it does quite poorly on the exploration rounds where it tries without running the computation, so it never chooses to do that.
My intuition is that if you are trying to draw causal graphs that do something other than draw arrows from x to f(x) (where f is something like a decision theory), then you are doing something wrong. But I guess I could be wrong about this. However, the point stands that it won’t be possible to distinguish CDT and EDT without either finding a plausible causal graph which doesn’t have the property I want, or talking about an agent that doesn’t have access to its own internal states immediately prior to the decision theory computation (and it seems reasonable to say that such an agent is badly designed).
After thinking more specifically about my example, I think it is based on mistakenly conflating two conceptual divisions of the process in question into action vs computation. I think it is still good as an example of approximately how badly designed an agent needs to be before CDT and EDT become distinct.