orthonormal comments on A question about Eliezer

orthonormal 20 Apr 2012 19:04 UTC
11 points
To explain the issue here in intuitive terms: let’s say we have the hypothesis that Alice owns a cat, and we start with the prior probability of a person owning a cat (let’s say 1 in 20), and then update on the evidence: she recently moved from an apartment building that doesn’t allow cats to one that does (3 times more likely if she has a cat than if she doesn’t), she regularly goes to a pet store now (7 times more likely if she has a cat than if she doesn’t), and when she goes out there’s white hair on her jacket sleeves (5 times more likely if she has a cat than if she doesn’t). Putting all of these together by Bayes’ Rule, we end up 85% confident she has a cat, but in fact we’re wrong: she has a dog. And thinking about it in retrospect, we shouldn’t have gotten 85% certainty of cat ownership. How did we get so confident in a wrong conclusion?

It’s because, while each of those likelihoods is valid in isolation, they’re not independent: there are a big chunk of people who move to pet-friendly apartments and go to pet stores regularly and have pet hair on their sleeves, and not all of them are cat owners. Those people are called pet owners in general, but even if we didn’t know that, a good Bayesian would have kept tabs on the cross-correlations and noted that the straightforward estimate would be thereby invalid.

EDITED TO ADD: So the difference between that and the IQ test example is that you don’t expect there to be an exceptional number of people who get the first two questions right and then do poorly on the rest of the test. The analogue there would be that, even though ability to solve mathematical problems correlates with ability to solve language problems, you should only count that correlation once. If a person does well on a slate of math problems, that’s evidence they’ll do well on language problems, but doing well on a second math test doesn’t count as more strong evidence they’ll do well on word problems. (That is, there are sharply diminishing returns.)
- semianonymous 21 Apr 2012 4:41 UTC
  −2 points
  Parent
  The cat is defined outside being a combination of traits of owner; that is the difference between the cat and IQ or any other psychological measure. If we were to say ‘pet’, the formula would have worked, even better if we had a purely black box qualifier into people who have bunch of traits vs people who don’t have bunch of traits, regardless of what is the cause (a pet, a cat, a weird fetish for pet related stuff).
  
  It is however the case that narcissism does match sociopathy, to the point that difference between the two is not very well defined. Anyhow we can restate the problem and consider it a guess at the properties of the utility function, adding extra verbiage.
  
  The analogy on the math problems is good but what we are compensating for is miscommunication, status gaming, and such, by normal people.
  
  I would suggest, actually, not the Bayesian approach, but statistical prediction rule or trained neural network.
  - othercriteria 22 Apr 2012 14:05 UTC
    0 points
    Parent
    
    I would suggest, actually, not the Bayesian approach, but statistical prediction rule or trained neural network.
    
    Given the asymptotic efficiency of the Bayes decision rule in a broad range of settings, those alternatives would give equivalent or less accurate classifications if enough training data (and computational power) were available. If this argument is not familiar, you might want to consult Chapter 2 of The Elements of Statistical Learning.