However, if we kept adding more purely observational information about the universe, it feels like we might be able to get a causal model out a transformer-like thing.
I think it is true.
If you observe everything in enough detail and your hypothesis space is complete you get counterfactual prediction automatically. Theoretical example: a Solomonoff inductor observes the world, Physical laws satisfy causality, the best prediction algorithm takes that into account, the inductor’s inference favors that algorithm, the algorithm can simulate Physical laws and so produce counterfactuals if needed in the course of its predictions.
If you live in a world where counterfactual thinking is possible and useful to predict the future, then Bayes brings you there.
An interesting look at the question of counterfactuals is the debate between Pearl and Robins on cross-world independence assumptions. It’s relevant because Robins solves the paradox of Pearl’s impossible to verify assumptions by noting that you can always add a mediator in any arrow of a causal model (I’d add, due to locality of Physical laws) and this makes the assumptions verifiable in principle. In other words, by observing the “full video” of a process, instead of just the frames represented by some random variables, you need less out-of-the-hat assumptions to infer counterfactual causal quantities.
I tried to write an explanation, but I realized I still don’t understand the matter enough to go through the details, so I’ll leave you a reference: the last section, “Mediation”, in this Robins interview.
I’m aware there are some tools that try to extract a DAG from the data as a primitive form of this approach, but is at odds with the Bayesian stats approach of having a DAG first, then checking to see if the DAG holds with the data or vice versa. Please share if you have some references that would be useful.
I think it is true.
If you observe everything in enough detail and your hypothesis space is complete you get counterfactual prediction automatically. Theoretical example: a Solomonoff inductor observes the world, Physical laws satisfy causality, the best prediction algorithm takes that into account, the inductor’s inference favors that algorithm, the algorithm can simulate Physical laws and so produce counterfactuals if needed in the course of its predictions.
If you live in a world where counterfactual thinking is possible and useful to predict the future, then Bayes brings you there.
An interesting look at the question of counterfactuals is the debate between Pearl and Robins on cross-world independence assumptions. It’s relevant because Robins solves the paradox of Pearl’s impossible to verify assumptions by noting that you can always add a mediator in any arrow of a causal model (I’d add, due to locality of Physical laws) and this makes the assumptions verifiable in principle. In other words, by observing the “full video” of a process, instead of just the frames represented by some random variables, you need less out-of-the-hat assumptions to infer counterfactual causal quantities.
I tried to write an explanation, but I realized I still don’t understand the matter enough to go through the details, so I’ll leave you a reference: the last section, “Mediation”, in this Robins interview.
My superficial impression is that the field of causal discovery does not have its shit together. Not to dunk on them; it’s not a law of Nature that what you set out to do will be within your ability. See also “Are there any good, easy-to-understand examples of cases where statistical causal network discovery worked well in practice?”