My confusion however was that this didn’t seem to be what the SEP article was suggesting as it runs a proof showing that even under this second partition CDT will suggest 2 boxing.
The way they manage that is by defining the expected utility of an action using the “probabilities of conditionals”, which are written using the notation P(A > E) where A is an action and E is an event. These “probabilities of conditionals” encode the causal information that CDT relies on.
In my previous comment I described CDT as seeking the value of a which maximizes the expression EU(a) = Sum(over x) P(X = x) * E(U | X = x and A = a), and observed that replacing X with a different random variable may change the expected utility. However, if we change P(X = x) to P(a > (X = x)), so that EU(a) = Sum(over x) P(a > (X = x)) * E(U | X = x and A = a), then EU(a) remains unchanged if we replace X with another random variable X’, as long as the pair (X’, A) uniquely determines X. (One obvious choice of X’ is to just take X’(w) = w. This corresponds to what SEP describes as Sobel’s “basic formula”.)
What’s a bit frustrating about this is that the axioms for “probabilities of conditionals” are never spelled out. However, I suspect that defining P(A > E) for all A and E is equivalent to defining a random variable “X” as in my previous comment:
In one direction, if expressions of the form P(A > E) are well-defined then say that an event E is ‘unaffected’ if and only if P(A > E) = P(E) for all actions A. Then we can define X(w) = (E(w) : All unaffected events E), which is “the state of the world immediately prior to the action”.
In the other direction, if we’re given X then we can define P(a > E) as Sum(over x) P(X = x) * P(E | X = x and A = a). Then the meaning of the expression P(a > E) will be: “The probability of E turning out true if we
Compute a random ‘history of the world’ (including a value of A) then
“Surgically” change the value of A to a, and
Recompute the future light cone of A.”
[Strictly speaking I need to prove that if you define X in terms of P(a > E) and then redefine P(a > E) in terms of X then you get back what you started off with. I don’t know how to do that because I don’t know what axioms for ‘probabilities of conditionals’ are. But it works in the Newcomb example.]
Therefore, it seems slightly perverse for SEP to put such emphasis on achieving ‘partition independence’ when making CDT work at all requires choosing a partition, whether explicitly (e.g. by choosing “X”) or implicitly (by defining probabilities of conditionals). It seems like it’s just a cosmetic difference.
Maybe the proof they use to show that CDT reaches the same answer to both partitions in Newcomb’s is designed to show how a partition-invariant form of CDT works rather than how a partition-dependent form of CDT works.
The way they manage that is by defining the expected utility of an action using the “probabilities of conditionals”, which are written using the notation P(A > E) where A is an action and E is an event. These “probabilities of conditionals” encode the causal information that CDT relies on.
In my previous comment I described CDT as seeking the value of a which maximizes the expression EU(a) = Sum(over x) P(X = x) * E(U | X = x and A = a), and observed that replacing X with a different random variable may change the expected utility. However, if we change P(X = x) to P(a > (X = x)), so that EU(a) = Sum(over x) P(a > (X = x)) * E(U | X = x and A = a), then EU(a) remains unchanged if we replace X with another random variable X’, as long as the pair (X’, A) uniquely determines X. (One obvious choice of X’ is to just take X’(w) = w. This corresponds to what SEP describes as Sobel’s “basic formula”.)
What’s a bit frustrating about this is that the axioms for “probabilities of conditionals” are never spelled out. However, I suspect that defining P(A > E) for all A and E is equivalent to defining a random variable “X” as in my previous comment:
In one direction, if expressions of the form P(A > E) are well-defined then say that an event E is ‘unaffected’ if and only if P(A > E) = P(E) for all actions A. Then we can define X(w) = (E(w) : All unaffected events E), which is “the state of the world immediately prior to the action”.
In the other direction, if we’re given X then we can define P(a > E) as Sum(over x) P(X = x) * P(E | X = x and A = a). Then the meaning of the expression P(a > E) will be: “The probability of E turning out true if we
Compute a random ‘history of the world’ (including a value of A) then
“Surgically” change the value of A to a, and
Recompute the future light cone of A.”
[Strictly speaking I need to prove that if you define X in terms of P(a > E) and then redefine P(a > E) in terms of X then you get back what you started off with. I don’t know how to do that because I don’t know what axioms for ‘probabilities of conditionals’ are. But it works in the Newcomb example.]
Therefore, it seems slightly perverse for SEP to put such emphasis on achieving ‘partition independence’ when making CDT work at all requires choosing a partition, whether explicitly (e.g. by choosing “X”) or implicitly (by defining probabilities of conditionals). It seems like it’s just a cosmetic difference.
Yeah, that’s what I think.
Thanks. I think I’m starting to understand it all.