elharo comments on Explicit and tacit rationality

elharo 10 Apr 2013 12:36 UTC
4 points
Basic statistics question: if we find that 99% of all people are irrational, but “only” 90% of millionaires are irrational, is that evidence that rationality does lead to (increased probability of) winning, or is it only evidence that rationality is correlated with winning? For instance, how do I know that millionaires aren’t more rational simply because they can afford to go to CFAR workshops and have more freetime to read LessWrong?

I.e. knowing only that 99% of all people are A but “only” 90% of millionaires are A, how do I adjust my respective probabilities that
1. A --> millionaires
2. Millionaires --> A
3. Unknown factor C causes both A and millionaires
It feels like I ought to assign some additional likelihood to each of these 3 cases, but I’m not sure how to split it up. Maybe the answer is simply, “gather more evidence to attempt to tease out the proper causal relationship”.
- IlyaShpitser 15 Apr 2013 7:18 UTC
  4 points
  Parent
  This is a causal question, not a statistical question. You answer by implementing the relevant intervention, usually by randomization, or maybe you find a natural experiment, or maybe [lots of other ways people thought of].
  
  You can’t in general use observational data (e.g. what you call “evidence”) to figure out causal relationships. You need causal assumptions somewhere.
  - Richard_Kennaway 16 Apr 2013 15:18 UTC
    0 points
    Parent
    
    You can’t in general use observational data (e.g. what you call “evidence”) to figure out causal relationships. You need causal assumptions somewhere.
    
    What do you think of this challenge, to detect causality from nothing but a set of pairs of values of unnamed variables?
    - IlyaShpitser 16 Apr 2013 17:30 UTC
      5 points
      Parent
      You can do it with enough causal assumptions (e.g. not “from nothing”). There is a series of magical papers, e.g. this:
      
      http://www.cs.helsinki.fi/u/phoyer/papers/pdf/hoyer2008nips.pdf
      
      which show you can use additive noise assumptions to orient edges.
      
      I have a series of papers:
      
      http://www.auai.org/uai2012/papers/248.pdf
      
      http://arxiv.org/abs/1207.5058
      
      which show you don’t even need conditional independences to orient edges. For example if the true dag is this:
      
      1 → 2 → 3 → 4, 1 ← u1 → 3, 1 ← u2 → 4,
      
      and we observe p(1, 2, 3, 4) (no conditional independences in this marginal), I can recover the graph exactly with enough data. (The graph would be causal if we assume the underlying true graph is, otherwise it’s just a statistical model).
      
      People’s intuitions about what’s possible in causal discovery aren’t very good.
      
      It would be good if statisticians and machine learning / comp. sci. people came together to hash out their differences regarding causal inference.
    - gwern 16 Apr 2013 16:06 UTC
      0 points
      Parent
      Gelman seems skeptical.
      - Richard_Kennaway 16 Apr 2013 16:30 UTC
        0 points
        Parent
        I saw that, but I didn’t see much substance to his remarks, nor in the comments.
        
        Here is a paper surveying methods of methods of causal analysis for such non-interventional data, and summarising the causal assumptions that they make:
        
        “New methods for separating causes from effects in genomics data”
        Alexander Statnikov, Mikael Henaff, Nikita I Lytkin, Constantin F Aliferis
- Viliam_Bur 10 Apr 2013 14:11 UTC
  3 points
  Parent
  
  It feels like I ought to assign some additional likelihood to each of these 3 cases, but I’m not sure how to split it up.
  
  Two things:
  
  1) Your prior probabilities. If before getting your evidence you expect that hypothesis H1 is twice as likely as H2, and the new evidence is equally likely under both H1 and H2, you should update so that the new H1 remains twice as likely as H2.
  
  2) Conditional probabilities of the evidence under different hypotheses. Let’s suppose that hypothesis H1 predicts a specific evidence E with probability 10%, hypothesis H2 predicts E with probability 30%. After seeing E, the ratio between H1 and H2 should be multiplied by 1:3.
  
  The first part means simply: Before the (fictional) research about rationality among millionaires was made, which probability would you assign to your hypotheses?
  
  The second part means: If we know that 99% of all people are irrational, what would be your expectation about % of irrational millionaires, if you assume that e.g. the first hypothesis “rationality causes millionaires” is true. Would you expect to see 95% or 90% or 80% or 50% or 10% or 1% of irrational millionaires? Make your probability distribution. Now do the same thing for each one of the remaining hypotheses. -- Ta-da, the research is over and we know that the % of irrational millionaires is 90%, not more, not less. How good were the individual hypotheses at predicting this specific outcome?
  
  (I don’t mean to imply that doing either of these estimates is easy. It is just the way it should be done.)
  
  Maybe the answer is simply, “gather more evidence
  
  Gathering more evidence is always good (ignoring the costs of gathering the evidence), but sometimes we need to make an estimate based on data we already have.