Well, first off, Pearl would remind you that reduction doesn’t have to mean probability distributions. If Markov models are simple explanations of our observations, then what’s the problem with using them?
The surface-level answer to your question would be to talk about how to interconvert between causal graphs and probabilities, thereby identifying any function on causal graphs (like setting the value of a node without updating its parents) with an operator on probability distributions (given the graphical model). Note that in common syntax, “conditioning” on do()-ing something means applying the operator to the probability distribution. But you can google this or find it in Pearl’s book Causality.
I’d just like you to think more about what you want from an “explanation.” What is it you want to know that would make things feel explained?
If Markov models are simple explanations of our observations, then what’s the problem with using them?
To be clear, by total propability distribution I mean a distribution over all possible conjunctions of events. A Markov model also creates a total propability distribution, but there are multiple Markov models with the same propability distribution. Believing in a Markov model is more specific, and so if we could do the same work with just propability distributions, then Occam would seem to demand we do.
The surface-level answer to your question would be to talk about how to interconvert between causal graphs and probabilities… But you can google this or find it in Pearl’s book Causality.
My understanding is that you can’t infer a causal graph from just a propability distribution. You need either causal assumptions or experiments to do that, and experimenting involves do()ing, so I’m asking if it can be explained what do()ing is in non-causal terms.
I’d just like you to think more about what you want from an “explanation.” What is it you want to know that would make things feel explained?
If there were a way to infer causal structure from just propability distributions, that would be an explanation. Infering them from something else might also work, but it depends on what the something is, and I don’t think I can give you a list of viable options in advance.
Alternatively, you might say that causality can’t be reduced to something else. In that case, I would like to know how I come to have beliefs about causality, and why this gives true answers. I have something like that for propability distributions: I have a prior and a rule to update it (how I come to believe it) and a theorem saying if I do that in the limit I’ll always do at least as well as my best hypothesis with propability ≠ 0 in the prior (why it works).
I think Judea Pearl would answer that the do() operator is the most reductionistic explanation that is possible. The point of the do calculus is precisely that it can’t be found in the data (the difference between do(x) and “see(x)”) and requires causal assumptions. Without a causal model, there is no do operator. And conversely, one cannot create a causal model from pure data alone- The do operator is on a higher rung of “the ladder of causality” from bare probabilities.
I feel like there’s a partial answer to your last question in that do-calculus is to causal reasoning what the bayes rule is to probability. The do calculus can be derived from probability rules and the introduction of the do() operator- but the do() operator itself is something can not be explained in non causal terms. Pearl believes we inherently use some version of do calculus when we think about causality.
These ideas are all in Pearls “the book of why”.
But now I think your question is where do the models come from? For researchers, the causal models they create come from background information they have of the problem they’re working with. A confounder is possible between these parameters, but not those because of randomization etc. etc.
But in a newly born child or blank AI system, how does it acquire causal models? If that is explained, then we have answered your question. I don’t have a good answer.
I myself think (but I haven’t given it enough thought) that there might be a bridge from data to causal models though falsification. Take a list of possible causal models for a given problem and search through your data. You might not be able to prove your assumptions, but you might be able to rule causal models out, if they suppose there is a causal relation between two variables that show no correlation at all.
The trouble is, you don’t know whether you can rule out the correlation, or if there is a correlation which doesn’t show in the data because of a confounder. It seems plausible to me that children just assume they can rule out the correlation and assume one of the remaining causal models until new evidence proves them wrong again, and so enter into an iterative process eventually leading to a causal model. But again, this idea isn’t well developed.
But in a newly born child or blank AI system, how does it acquire causal models?
I see no problem assuming that you start out with a prior over causal models—we do the same for propabilistic models after all. The question is how the updating works, and if, assuming the world has a causal structure, this way of updating can identify it.
I myself think (but I haven’t given it enough thought) that there might be a bridge from data to causal models though falsification. Take a list of possible causal models for a given problem and search through your data. You might not be able to prove your assumptions, but you might be able to rule causal models out, if they suppose there is a causal relation between two variables that show no correlation at all.
This can never distinguish between different causal models that predict the same propability distribution—all the advantage this would have over purely propabilistic updating would already be included in the prior.
To update in a way that distinguishes between causal models, you need to update on information that do(event) is true for some event. Now in this case you could allow each causal model to decide when that is true,for the purposes of its own updating, so you are now allowed to define it in causal terms. This would still need some work from what I wrote in the question—you can’t really change something independent of its causal antecendents, at least not when we’re talking about the whole world which includes you, but perhaps some notion of independence would suffice. And then you would have to show that this really does converge on the true causal structure, if there is one.
Well, first off, Pearl would remind you that reduction doesn’t have to mean probability distributions. If Markov models are simple explanations of our observations, then what’s the problem with using them?
The surface-level answer to your question would be to talk about how to interconvert between causal graphs and probabilities, thereby identifying any function on causal graphs (like setting the value of a node without updating its parents) with an operator on probability distributions (given the graphical model). Note that in common syntax, “conditioning” on do()-ing something means applying the operator to the probability distribution. But you can google this or find it in Pearl’s book Causality.
I’d just like you to think more about what you want from an “explanation.” What is it you want to know that would make things feel explained?
To be clear, by total propability distribution I mean a distribution over all possible conjunctions of events. A Markov model also creates a total propability distribution, but there are multiple Markov models with the same propability distribution. Believing in a Markov model is more specific, and so if we could do the same work with just propability distributions, then Occam would seem to demand we do.
My understanding is that you can’t infer a causal graph from just a propability distribution. You need either causal assumptions or experiments to do that, and experimenting involves do()ing, so I’m asking if it can be explained what do()ing is in non-causal terms.
If there were a way to infer causal structure from just propability distributions, that would be an explanation. Infering them from something else might also work, but it depends on what the something is, and I don’t think I can give you a list of viable options in advance.
Alternatively, you might say that causality can’t be reduced to something else. In that case, I would like to know how I come to have beliefs about causality, and why this gives true answers. I have something like that for propability distributions: I have a prior and a rule to update it (how I come to believe it) and a theorem saying if I do that in the limit I’ll always do at least as well as my best hypothesis with propability ≠ 0 in the prior (why it works).
I think Judea Pearl would answer that the do() operator is the most reductionistic explanation that is possible. The point of the do calculus is precisely that it can’t be found in the data (the difference between do(x) and “see(x)”) and requires causal assumptions. Without a causal model, there is no do operator. And conversely, one cannot create a causal model from pure data alone- The do operator is on a higher rung of “the ladder of causality” from bare probabilities.
I feel like there’s a partial answer to your last question in that do-calculus is to causal reasoning what the bayes rule is to probability. The do calculus can be derived from probability rules and the introduction of the do() operator- but the do() operator itself is something can not be explained in non causal terms. Pearl believes we inherently use some version of do calculus when we think about causality.
These ideas are all in Pearls “the book of why”.
But now I think your question is where do the models come from? For researchers, the causal models they create come from background information they have of the problem they’re working with. A confounder is possible between these parameters, but not those because of randomization etc. etc.
But in a newly born child or blank AI system, how does it acquire causal models? If that is explained, then we have answered your question. I don’t have a good answer.
I myself think (but I haven’t given it enough thought) that there might be a bridge from data to causal models though falsification. Take a list of possible causal models for a given problem and search through your data. You might not be able to prove your assumptions, but you might be able to rule causal models out, if they suppose there is a causal relation between two variables that show no correlation at all.
The trouble is, you don’t know whether you can rule out the correlation, or if there is a correlation which doesn’t show in the data because of a confounder. It seems plausible to me that children just assume they can rule out the correlation and assume one of the remaining causal models until new evidence proves them wrong again, and so enter into an iterative process eventually leading to a causal model. But again, this idea isn’t well developed.
I see no problem assuming that you start out with a prior over causal models—we do the same for propabilistic models after all. The question is how the updating works, and if, assuming the world has a causal structure, this way of updating can identify it.
This can never distinguish between different causal models that predict the same propability distribution—all the advantage this would have over purely propabilistic updating would already be included in the prior.
To update in a way that distinguishes between causal models, you need to update on information that do(event) is true for some event. Now in this case you could allow each causal model to decide when that is true,for the purposes of its own updating, so you are now allowed to define it in causal terms. This would still need some work from what I wrote in the question—you can’t really change something independent of its causal antecendents, at least not when we’re talking about the whole world which includes you, but perhaps some notion of independence would suffice. And then you would have to show that this really does converge on the true causal structure, if there is one.