Good question. I recommend looking at this post. The very short version is:
P isn’t itself a distribution. It’s an operator which takes in a model (i.e. M), and spits out distributions of events/variables defined in that model (i.e. X).
The model M contains some random variables (i.e.X and maybe others), and somehow specifies how to sample them. I usually picture M as either a Judea Pearl-style causal DAG, or a program which calls rand() sometimes.
What is the exact formal difference/relation between probability distribution P, random variable X, and causal model M?
Good question. I recommend looking at this post. The very short version is:
P isn’t itself a distribution. It’s an operator which takes in a model (i.e. M), and spits out distributions of events/variables defined in that model (i.e. X).
The model M contains some random variables (i.e.X and maybe others), and somehow specifies how to sample them. I usually picture M as either a Judea Pearl-style causal DAG, or a program which calls rand() sometimes.
X is a variable in the model.