Ronny Fernandez comments on Causality: a chapter by chapter review

Ronny Fernandez Oct 2, 2012, 4:43 AM
4 points
I have a question: is D-separation implied by the komologorov axioms?

I’ve proven that it is in some cases:

Premises:

1)A = A|B :. A|BC ≤ A|C
2)C < C|A
3)C < C|B
4) C|AB < C

proof starts:
1)B|C > B {via premise 3
2)A|BC = A B C|AB / (C B|C) {via premise 1
3)A|BC C = A B C|AB / B|C
4)A|BC C / A = B C|AB / B|C
5)B C|AB / B|C < C|AB {via line 1
6)B C|AB / B|C < C {via line 5 and premise 4
7)A|BC C / A < C {via lines 6 and 4
8)A|C = A C|A / C
9)A|C C = A C|A
10)A|C C / A = C|A
11)C < A|C C / A {via line 10 and premise 2
12)A|BC C / A < A|C C / A {via lines 11 and 7
13)A|BC < A|C
Q.E.D.

Premises:

1) A = A|B :. A|BC ≤ A|C
2) C < C|A
3) C < C|B
4) C|AB = C

proof starts:

1)A|C = A C|A / C
2)A|BC = A B C / (B C|B) {via premises 1 and 4
3)A|BC = A C / C|B
4)A C < A C|A {via premise 2
5)A C / C|B < A * C|A / C {via line 4 and premise 3
6)A|BC < A|C {via lines 1, 3, and 5
Q.E.D.

If it is implied by classical probability theory, could someone please refer me to a proof?
- IlyaShpitser Oct 4, 2012, 10:40 PM
  3 points
  Parent
  I don’t understand your question, or your notation.
  
  d-separation is just a way of talking about separating sets of vertices in a graph by “blocking” paths. It can’t be implied by anything because it is not a statement in a logical language. For “certain” graph/joint distribution pairs, if a d-separation statement holds in the graph, then a corresponding conditional independence statement holds in the joint distribution. This is a statement, and it is proven in Verma and Pearl 1988, as paper-machine below says. Is that the statement you mean? There are lots of interesting true and hard to prove statements one could make involving d-separation.
  
  I guess from a model theorist point of view, it’s a proof in ZF, but it’s high level and “elementary” by model theory standards.
  - Ronny Fernandez Oct 11, 2012, 11:02 AM
    0 points
    Parent
    Looking it over, I could have been much clearer (sorry). Specifically I want to know. Given a Dag of the form:
    
    A → C ← B
    
    Is it true that (in all prior joint distributions where A is independent of B, but A is evidence of C, and B is evidence of C) A is none-independent of B, given C is held constant?
    
    I proved that when A & B is evidence against C, this is so, and also when A & B are independent of C, this is so, the only case I am missing is when A & B is evidence for C.
    
    It’s clear enough to me that when you have one none-colliding path between any two variables, they must not be independent; and that if we were to hold any of the variable along that path constant, that those variables would be independent. This can all be shown given standard probability theory and correlation alone. It can also be shown that if there are only colliding paths between two variables, those two variables are independent. If I have understood the theory of d-separation correctly, if we hold the collision variable (assuming there is only one) on one of these paths constant, the two variables should become none-independent (either evidence for or against one another). I have proven that this is so in two of the (at least) three cases that fit the given DAG using standard probability theory.
    
    Those are the proofs I gave above.
    - IlyaShpitser Oct 25, 2012, 7:03 PM
      1 point
      Parent
      
      Is it true that (in all prior joint distributions where A is independent of B, but A is evidence of C, and B is evidence of C) A is none-independent of B, given C is held constant?
      
      No, but I think it’s true if A,B,C are binary. In general, if a distribution p is Markov relative to a graph G, then if something is d-separated in G, then there is a corresponding independence in p. But, importantly, the implication does not always go the other way. Distributions in which the implication always goes the other way are very special and are called faithful.
      - Ronny Fernandez Oct 7, 2015, 5:34 AM
        0 points
        Parent
        What is Markov relative?
        Anders_H Oct 7, 2015, 6:30 AM
        0 points
        Parent
        “Markov” is used in the standard memoryless sense. By definition, the graph G represents any distribution p where each variable on the graph is independent of its past given its parents. This is the Markov property.
        
        Ilya is discussing probability distributions p that may or may not be represented by graph G. If every variable in p is independent of its past given its parents in G, then you can use d-separation in G to reason about independences in p.
- [deleted]Oct 2, 2012, 5:03 AM
  0 points
  Parent
  
  Theorem 1.2.4: If sets X and Y are d-separated by Z, in a DAG G, then X is independent of Y conditional on Z in every distribution compatible with G....
  
  Pearl’s textbook cites Verma and Pearl, 1988, but I don’t have access to it.