Well, kinda. I am not sure whether the final output—the joint densities of outcomes—will be different in a causal model compared to a properly specified conventional model.
To continue with the same example, it suffers from the expression “wet grass” meaning two different things—either “I see wet grass” or “I made grass wet”. This is your difference between just (a=1) and do(a=1) -- but conventional non-causal modeling doesn’t have huge problems with this, it is fully aware of the difference.
And I don’t know if it’s necessary to formalize intervention. I freely concede that it’s useful in certain areas but not so sure that’s true for all areas.
Well, kinda. I am not sure whether the final output—the joint densities of outcomes—will be different in a causal model compared to a properly specified conventional model.
So, we could add a node to the graph for every single node, which corresponds to whether or not that node was the subject of an intervention. So you would talk about P(rain|grass is wet, ~I made it rain, ~I made the grass wet) vs. P(rain|grass is wet, ~I made it rain, I made the grass wet). But this means doubling the number of nodes in the dataset (which, since the number of probabilities is exponential in the number of nodes for a discrete dataset, is a terrible idea). You also might want to throw in a lot of consistency constraints which are not guaranteed to hold in an arbitrary graph, which makes things more awkward.
It is much simpler, conceptually and practically, to just have a rule to determine how interventions differ from observations in updating the state of the graph, that is, talking about P(rain|grass is wet) vs. P(rain|do(grass is wet)).
So, we could add a node to the graph for every single node, which corresponds to whether or not that node was
the subject of an intervention.
In fact, Phil Dawid does precisely this. What he ends up with is still interventions. (Of course he (I think!) does not believe in counterfactuals, but that is a long discussion.)
So, we could add a node to the graph for every single node
That assumes we’re doing graphs and networks.
My problems in this subthread really started when the causal model was defined as “a set of joint distributions defined over potential outcome random variables”—notice how nothing like networks or interventions is mentioned here—and I got curious why a plain-vanilla Bayesian model which also produces a set of joint distributions doesn’t qualify.
Sorry this is a response to an old comment, but this is an easy to clarify question.
A potential outcome Y(a) is a random variable under an intervention, e.g. Y under do(a). It’s just a different notation from a different branch of statistics.
We may or may not choose to use graphs to represent causality (or indeed probability). Some people like graphs, others do not. Graphs do not add anything, they are just a visual representation.
Well, kinda. I am not sure whether the final output—the joint densities of outcomes—will be different in a causal model compared to a properly specified conventional model.
To continue with the same example, it suffers from the expression “wet grass” meaning two different things—either “I see wet grass” or “I made grass wet”. This is your difference between just (a=1) and do(a=1) -- but conventional non-causal modeling doesn’t have huge problems with this, it is fully aware of the difference.
And I don’t know if it’s necessary to formalize intervention. I freely concede that it’s useful in certain areas but not so sure that’s true for all areas.
So, we could add a node to the graph for every single node, which corresponds to whether or not that node was the subject of an intervention. So you would talk about P(rain|grass is wet, ~I made it rain, ~I made the grass wet) vs. P(rain|grass is wet, ~I made it rain, I made the grass wet). But this means doubling the number of nodes in the dataset (which, since the number of probabilities is exponential in the number of nodes for a discrete dataset, is a terrible idea). You also might want to throw in a lot of consistency constraints which are not guaranteed to hold in an arbitrary graph, which makes things more awkward.
It is much simpler, conceptually and practically, to just have a rule to determine how interventions differ from observations in updating the state of the graph, that is, talking about P(rain|grass is wet) vs. P(rain|do(grass is wet)).
In fact, Phil Dawid does precisely this. What he ends up with is still interventions. (Of course he (I think!) does not believe in counterfactuals, but that is a long discussion.)
That assumes we’re doing graphs and networks.
My problems in this subthread really started when the causal model was defined as “a set of joint distributions defined over potential outcome random variables”—notice how nothing like networks or interventions is mentioned here—and I got curious why a plain-vanilla Bayesian model which also produces a set of joint distributions doesn’t qualify.
It probably just was a bad definition.
Sorry this is a response to an old comment, but this is an easy to clarify question.
A potential outcome Y(a) is a random variable under an intervention, e.g. Y under do(a). It’s just a different notation from a different branch of statistics.
We may or may not choose to use graphs to represent causality (or indeed probability). Some people like graphs, others do not. Graphs do not add anything, they are just a visual representation.