SilasBarta comments on Causal Diagrams and Causal Models

SilasBarta 5 Nov 2012 3:10 UTC
0 points

But it doesn’t save you from having to write out all the combinations.

It saves you from having to write them until needed, in which case they can be extracted by walking through the graph rather than doing a lookup on a superexponential table.

You also do not motivate why someone would be interested in a big list of conditional independencies for its own sake. Surely, what we ultimately want to know is e.g. the probability that it will rain tomorrow, not whether or not rain is correlated with sprinklers.

Yes, the question was what they would care about if they were only interested in predictions. And so I think I’ve motivated why they would care about conditional (in)dependencies: it determines the (minimal) set of variables they need to look at! Whatever minimal method of representing their knowledge will then have these arrows (from one of the networks that fits the data).

If you require that causality definitions be restricted to (uncorrelated) counterfactual operations (like Pearl’s “do” operation), then sure, the Armcharians won’t do that specific computation. But if you use the definition of causality from this article, then I think it’s clear that efficiency considerations will lead them to use something isomorphic to it.
- Pfft 5 Nov 2012 3:36 UTC
  0 points
  Parent
  
  It saves you from having to write them until needed
  
  I was saying that not every independence property is representable as a Bayesian network.
  
  Whatever minimal method of representing their knowledge will then have these arrows (from one of the networks that fits the data).
  
  No! Once you have learned a distribution using Bayesian network-based methods, the minimal representation of it is the set of factors. You don’t need the direction of the arrows any more.
  - SilasBarta 5 Nov 2012 3:51 UTC
    0 points
    Parent
    
    I was saying that not every independence property is representable as a Bayesian network.
    
    You mean when all variables are independent, or some other class of cases?
    
    No! Once you have learned a distribution using Bayesian network-based methods, the minimal representation of it is the set of factors. You don’t need the direction of the arrows any more.
    
    Read the rest: you need the arrows if you want to efficiently look up the conditional (in)dependencies.
    - Pfft 5 Nov 2012 4:20 UTC
      0 points
      Parent
      
      You mean when all variables are independent, or some other class of cases?
      
      Well, there are doubly-exponentially many possibilities…
      
      The usual example for Markov networks is four variables connected in a square. The corresponding independence assumption is that any two opposite corners are independent given the other two corners. There is no Bayesian network encoding exactly that.
      
      you need the arrows if you want to efficiently look up the conditional (in)dependencies.
      
      But again, why would you want that? As I said in the grand^(n)parent, you don’t need to when doing inference.
      - SilasBarta 5 Nov 2012 19:35 UTC
        0 points
        Parent
        
        The usual example for Markov networks is four variables connected in a square. The corresponding independence assumption is that any two opposite corners are independent given the other two corners. There is no Bayesian network encoding exactly that.
        
        Okay, I’m recalling the “troublesome” cases that Pearl brings up, which gives me a better idea of what you mean. But this is not a counterexample. It just means that you can’t do it on a Bayes net with binary nodes. You can still represent that situation by merging (either pair of) the screening nodes into one node that covers all combinations of possibilities between them.
        
        Do you have another example?
        
        But again, why would you want that? As I said in the grand^(n)parent, you don’t need to when doing inference.
        
        Sure you do: you want to know which and how many variables you have to look up to make your prediction.
        Pfft 5 Nov 2012 20:53 UTC
        0 points
        Parent
        
        merging (either pair of) the screening nodes into one node
        
        Then the network does not encode the conditional independence between the two variables that you merged.
        
        The task you have to do when making predictions is marginalization: in order to computer P(Rain|WetGrass), you need to compute the sum of P(Rain|WetGrass, X,Y,Z) for all possible values of the variables X, Y, Z that you didn’t observe. Here it is very helpful to have the distribution factored into a tree, since that can make it feasible to do variable elimination (or related algorithms like belief propagation). But the directions on the edges in the tree don’t matter, you can start at any leaf node and work across.