Thanks for this post, it’s really helpful. I would really like to understand the maths in this post, is there anywhere which describes this in more detail? In particular, I can’t follow:
The definition involving the permutation is a generalization of the example earlier in the post: ϕ(T) is the identity and ϕ(H) swaps heads and tails. And X=ϕ(A)−1(C). In general, if you observe A=a and C=c, then the counterfactual statement is that if you had observed A=a′, then you would have also observed C=ϕ(a′)(ϕ(a)−1(c)).
Thanks for this post, it’s really helpful. I would really like to understand the maths in this post, is there anywhere which describes this in more detail? In particular, I can’t follow:
Why are probabilities being permutated?
What kind of kernel are you referring to?
The definition involving the permutation is a generalization of the example earlier in the post: ϕ(T) is the identity and ϕ(H) swaps heads and tails. And X=ϕ(A)−1(C). In general, if you observe A=a and C=c, then the counterfactual statement is that if you had observed A=a′, then you would have also observed C=ϕ(a′)(ϕ(a)−1(c)).
I just learned about probability kernels thanks to user Diffractor. I might be using them wrong.