In my HAART example I gave you p(O=oj | a) explicitly. In that example using the EDT formula results in going to jail.
Not in my understanding. What you gave was P(O’=oj | a’), which looks similar, but talks about different RVs. That is the point I was trying to make by saying that “a random variable only ever has one value”.
Look, just go read about causal models. You are confused about very basic things.
Fair enough, I will do some reading when I have the time. Do you have any pointers to minimize the amount I have to read, or should I just read all of Pearl’s book?
ETA:
This also means that the nodes in a causal model, are not random variables.
A (real valued) random variable X is a function with type “X : Ω → R”, where Ω is the sample space. There are two ways to treat causal models:
Each node X represents a random variable X. Different instances (e.g. patients) correspond to different samples Ω.
Each node X represents a sequence of random variables X_i. Different patients correspond to different indices. The sample space Ω contains the entire real world, or at least all possible patients as well as the agent itself.
Interpretation 1 is the standard one, I think. I was advocating the second view when I said that nodes are not random variables. I suppose I could have been more clear.
“Most of the causal inference community” agrees that causal models are made up of potential outcomes, which on the unit level are propositional logical variables that determine how some “unit” (person, etc.) responds in a particular way (Y) to a hypothetical intervention on the direct causes of Y. If we don’t know which unit we are talking about, we average over them to get a random variable Y(pa(Y)). This view is almost a century old now (Neyman, 1923), and is a classic view in statistics.
I think it’s fine if you want to advocate a “new view” on things. I am just worried that you might be suffering from a standard LW disease of trying to be novel without adequately understanding the state of play, and why the state of play is the way it is.
At the end of the day, CDT is a “model,” and “all models are wrong.” However, it gives the right answer to the HAART question, and moreover the only way to give the right answer to these kinds of questions is to be isomorphic to the “CDT algorithm” for these kinds of questions.
“Most of the causal inference community” agrees that causal models are made up of potential outcomes, which on the unit level are propositional logical variables that determine how some “unit” (person, etc.) responds in a particular way (Y) to a hypothetical intervention on the direct causes of Y.
Is Y a particular way of responding (e.g. Y = “the person dies”), or is it a variable that denotes whether the person responds in that way (e.g. Y=1 if the person dies and 0 otherwise)? I think you meant the latter.
If we don’t know which unit we are talking about, we average over them to get a random variable Y(pa(Y)).
How does averaging over propositional logical variables give you a random variable? I am afraid I am getting confused by your terminology.
I think it’s fine if you want to advocate a “new view” on things. I am just worried that you might be suffering from a standard LW disease of trying to be novel without adequately understanding the state of play, and why the state of play is the way it is.
I wasn’t trying to be novel for the sake of it. Rather, I was just trying to write down my thoughts on the subject. As I said before, if you have some specific pointers to the state of the art in this field, then that would be much appreciated. Note that I have a background in computer science and machine learning, so I am somewhat familiar with causal models
and moreover the only way to give the right answer to these kinds of questions is to be isomorphic to the “CDT algorithm” for these kinds of questions.
That sounds interesting. Do you have a link to a proof of this statement?
Is Y a particular way of responding (e.g. Y = “the person dies”), or is it a variable that denotes whether the
person responds in that way (e.g. Y=1 if the person dies and 0 otherwise)? I think you meant the latter.
The latter.
How does averaging over propositional logical variables give you a random variable? I am afraid I am getting
confused by your terminology.
There is uncertainty about which unit u we are talking about (given by some p(u) we do not see). So instead of a propositional variable assignment Y(pa(y), u) = y, we have an event with a probability p{ Y(pa(y)) = y } = \sum{u : Y(pa(y),u) = y } p(u).
That sounds interesting. Do you have a link to a proof of this statement?
I am not sure I made a formal enough statement to prove. I guess:
(a) if you believe that your domain is acyclic causal, and
(b) you know what the causal structure is, and
(c) your utility is a function of the outcomes sitting in your causal system, and
(d) your actions on a variable embedded in your causal system break causal links operating from usual direct causes to the variable, and
(e) your domain isn’t “crazy” enough to demand adjustments along the lines of TDT,
then the right thing to do is to use CDT.
These preconditions hold in the HAART example. I am not sure exactly how to formalize (e) (I am not sure anyone does, this is a part of what is open).
Look, just go read about causal models. You are confused about very basic things.
In my HAART example I gave you p(O=oj | a) explicitly. In that example using the EDT formula results in going to jail.
Not in my understanding. What you gave was P(O’=oj | a’), which looks similar, but talks about different RVs. That is the point I was trying to make by saying that “a random variable only ever has one value”.
Fair enough, I will do some reading when I have the time. Do you have any pointers to minimize the amount I have to read, or should I just read all of Pearl’s book?
ETA:
A (real valued) random variable X is a function with type “X : Ω → R”, where Ω is the sample space. There are two ways to treat causal models:
Each node X represents a random variable X. Different instances (e.g. patients) correspond to different samples Ω.
Each node X represents a sequence of random variables X_i. Different patients correspond to different indices. The sample space Ω contains the entire real world, or at least all possible patients as well as the agent itself.
Interpretation 1 is the standard one, I think. I was advocating the second view when I said that nodes are not random variables. I suppose I could have been more clear.
“Most of the causal inference community” agrees that causal models are made up of potential outcomes, which on the unit level are propositional logical variables that determine how some “unit” (person, etc.) responds in a particular way (Y) to a hypothetical intervention on the direct causes of Y. If we don’t know which unit we are talking about, we average over them to get a random variable Y(pa(Y)). This view is almost a century old now (Neyman, 1923), and is a classic view in statistics.
I think it’s fine if you want to advocate a “new view” on things. I am just worried that you might be suffering from a standard LW disease of trying to be novel without adequately understanding the state of play, and why the state of play is the way it is.
At the end of the day, CDT is a “model,” and “all models are wrong.” However, it gives the right answer to the HAART question, and moreover the only way to give the right answer to these kinds of questions is to be isomorphic to the “CDT algorithm” for these kinds of questions.
Is Y a particular way of responding (e.g. Y = “the person dies”), or is it a variable that denotes whether the person responds in that way (e.g. Y=1 if the person dies and 0 otherwise)? I think you meant the latter.
How does averaging over propositional logical variables give you a random variable? I am afraid I am getting confused by your terminology.
I wasn’t trying to be novel for the sake of it. Rather, I was just trying to write down my thoughts on the subject. As I said before, if you have some specific pointers to the state of the art in this field, then that would be much appreciated. Note that I have a background in computer science and machine learning, so I am somewhat familiar with causal models
That sounds interesting. Do you have a link to a proof of this statement?
The latter.
There is uncertainty about which unit u we are talking about (given by some p(u) we do not see). So instead of a propositional variable assignment Y(pa(y), u) = y, we have an event with a probability p{ Y(pa(y)) = y } = \sum{u : Y(pa(y),u) = y } p(u).
I am not sure I made a formal enough statement to prove. I guess:
(a) if you believe that your domain is acyclic causal, and
(b) you know what the causal structure is, and
(c) your utility is a function of the outcomes sitting in your causal system, and
(d) your actions on a variable embedded in your causal system break causal links operating from usual direct causes to the variable, and
(e) your domain isn’t “crazy” enough to demand adjustments along the lines of TDT,
then the right thing to do is to use CDT.
These preconditions hold in the HAART example. I am not sure exactly how to formalize (e) (I am not sure anyone does, this is a part of what is open).