The problem is indeed that P(B|A) is insufficient to compute a unique counterfactual—additional causal information is needed. Pearl’s approach is to specify each observable variable as a deterministic function of its parents in the causal graph. Any uncertainty must be represented by a set of “exogenous” variables U, which can feature in the functions for the observables. (See chapter 7 of Causality, or also An Axiomatic Characterization of Causal Counterfactuals.)
For example, your first process could be represented by the following causal model:
A(X)=XB(A,Y)=¬(A⊕Y)P(X)=pP(Y)=0.75
The other processes might have different structures, equations, and distributions—P(X,Y)it’s not possible in general to distinguish these purely from the distribution P(A,B).
The problem is indeed that P(B|A) is insufficient to compute a unique counterfactual—additional causal information is needed. Pearl’s approach is to specify each observable variable as a deterministic function of its parents in the causal graph. Any uncertainty must be represented by a set of “exogenous” variables U, which can feature in the functions for the observables. (See chapter 7 of Causality, or also An Axiomatic Characterization of Causal Counterfactuals.)
For example, your first process could be represented by the following causal model:
A(X)=XB(A,Y)=¬(A⊕Y)P(X)=pP(Y)=0.75
The other processes might have different structures, equations, and distributions—P(X,Y)it’s not possible in general to distinguish these purely from the distribution P(A,B).