Thanks for sharing! It’s nice to see plasticity, especially for stats, which seems to have more opinionated contributors than other applied maths. Although, it seems this ‘admission’ is not changing his framework, but rather reinterpreting how ML is used to be compatible with his framework.
Pearl’s texts talk about having causal models that use the do(X) operator (e.g. P(Y|do(X))) to signify causal information. Now in LLMs, he sees the text the model is conditioning on as sometimes being do(X) or X. I’m curious what else besides text would count as this. I’m not sure that I recall this correctly but in his third level, you can use purely observational data to infer causality with things like instrumental variables. If I had a ML model that took as input purely numerical input, such as (tar, smoking status, got cancer, and various other health data), should it be able to predict counterfactual results?
I’m uncertain about what the right answer here is, and how Pearl would view this. My guess is a naive ML model would be able to do this provided the data covered the counterfactual cases which is likely for the smoking case. But it would not be as useful for out of sample counterfactual inferences where there is little or no coverage for the interventions and outcomes (e.g. if one of the inputs was ‘location’ it had to predict the effects of smoking on the ISS, where no one smokes). However, if we kept adding more purely observational information about the universe, it feels like we might be able to get a causal model out a transformer-like thing. I’m aware there are some tools that try to extract a DAG from the data as a primitive form of this approach, but is at odds with the Bayesian stats approach of having a DAG first, then checking to see if the DAG holds with the data or vice versa. Please share if you have some references that would be useful.
However, if we kept adding more purely observational information about the universe, it feels like we might be able to get a causal model out a transformer-like thing.
I think it is true.
If you observe everything in enough detail and your hypothesis space is complete you get counterfactual prediction automatically. Theoretical example: a Solomonoff inductor observes the world, Physical laws satisfy causality, the best prediction algorithm takes that into account, the inductor’s inference favors that algorithm, the algorithm can simulate Physical laws and so produce counterfactuals if needed in the course of its predictions.
If you live in a world where counterfactual thinking is possible and useful to predict the future, then Bayes brings you there.
An interesting look at the question of counterfactuals is the debate between Pearl and Robins on cross-world independence assumptions. It’s relevant because Robins solves the paradox of Pearl’s impossible to verify assumptions by noting that you can always add a mediator in any arrow of a causal model (I’d add, due to locality of Physical laws) and this makes the assumptions verifiable in principle. In other words, by observing the “full video” of a process, instead of just the frames represented by some random variables, you need less out-of-the-hat assumptions to infer counterfactual causal quantities.
I tried to write an explanation, but I realized I still don’t understand the matter enough to go through the details, so I’ll leave you a reference: the last section, “Mediation”, in this Robins interview.
I’m aware there are some tools that try to extract a DAG from the data as a primitive form of this approach, but is at odds with the Bayesian stats approach of having a DAG first, then checking to see if the DAG holds with the data or vice versa. Please share if you have some references that would be useful.
Thanks for sharing! It’s nice to see plasticity, especially for stats, which seems to have more opinionated contributors than other applied maths. Although, it seems this ‘admission’ is not changing his framework, but rather reinterpreting how ML is used to be compatible with his framework.
Pearl’s texts talk about having causal models that use the do(X) operator (e.g. P(Y|do(X))) to signify causal information. Now in LLMs, he sees the text the model is conditioning on as sometimes being do(X) or X. I’m curious what else besides text would count as this. I’m not sure that I recall this correctly but in his third level, you can use purely observational data to infer causality with things like instrumental variables. If I had a ML model that took as input purely numerical input, such as (tar, smoking status, got cancer, and various other health data), should it be able to predict counterfactual results?
I’m uncertain about what the right answer here is, and how Pearl would view this. My guess is a naive ML model would be able to do this provided the data covered the counterfactual cases which is likely for the smoking case. But it would not be as useful for out of sample counterfactual inferences where there is little or no coverage for the interventions and outcomes (e.g. if one of the inputs was ‘location’ it had to predict the effects of smoking on the ISS, where no one smokes). However, if we kept adding more purely observational information about the universe, it feels like we might be able to get a causal model out a transformer-like thing. I’m aware there are some tools that try to extract a DAG from the data as a primitive form of this approach, but is at odds with the Bayesian stats approach of having a DAG first, then checking to see if the DAG holds with the data or vice versa. Please share if you have some references that would be useful.
I think it is true.
If you observe everything in enough detail and your hypothesis space is complete you get counterfactual prediction automatically. Theoretical example: a Solomonoff inductor observes the world, Physical laws satisfy causality, the best prediction algorithm takes that into account, the inductor’s inference favors that algorithm, the algorithm can simulate Physical laws and so produce counterfactuals if needed in the course of its predictions.
If you live in a world where counterfactual thinking is possible and useful to predict the future, then Bayes brings you there.
An interesting look at the question of counterfactuals is the debate between Pearl and Robins on cross-world independence assumptions. It’s relevant because Robins solves the paradox of Pearl’s impossible to verify assumptions by noting that you can always add a mediator in any arrow of a causal model (I’d add, due to locality of Physical laws) and this makes the assumptions verifiable in principle. In other words, by observing the “full video” of a process, instead of just the frames represented by some random variables, you need less out-of-the-hat assumptions to infer counterfactual causal quantities.
I tried to write an explanation, but I realized I still don’t understand the matter enough to go through the details, so I’ll leave you a reference: the last section, “Mediation”, in this Robins interview.
My superficial impression is that the field of causal discovery does not have its shit together. Not to dunk on them; it’s not a law of Nature that what you set out to do will be within your ability. See also “Are there any good, easy-to-understand examples of cases where statistical causal network discovery worked well in practice?”