People do it selectively, though. When someone does IQ test and gets high score, you assume that person has high IQ, for instance, and don’t postulate existence of ‘low IQ people whom solved first two problems on the test’, whom would then be more likely to solve other, different problems, while having ‘low IQ’, and ultimately score high while having ‘low IQ’.
To explain the issue here in intuitive terms: let’s say we have the hypothesis that Alice owns a cat, and we start with the prior probability of a person owning a cat (let’s say 1 in 20), and then update on the evidence: she recently moved from an apartment building that doesn’t allow cats to one that does (3 times more likely if she has a cat than if she doesn’t), she regularly goes to a pet store now (7 times more likely if she has a cat than if she doesn’t), and when she goes out there’s white hair on her jacket sleeves (5 times more likely if she has a cat than if she doesn’t). Putting all of these together by Bayes’ Rule, we end up 85% confident she has a cat, but in fact we’re wrong: she has a dog. And thinking about it in retrospect, we shouldn’t have gotten 85% certainty of cat ownership. How did we get so confident in a wrong conclusion?
It’s because, while each of those likelihoods is valid in isolation, they’re not independent: there are a big chunk of people who move to pet-friendly apartments and go to pet stores regularly and have pet hair on their sleeves, and not all of them are cat owners. Those people are called pet owners in general, but even if we didn’t know that, a good Bayesian would have kept tabs on the cross-correlations and noted that the straightforward estimate would be thereby invalid.
EDITED TO ADD: So the difference between that and the IQ test example is that you don’t expect there to be an exceptional number of people who get the first two questions right and then do poorly on the rest of the test. The analogue there would be that, even though ability to solve mathematical problems correlates with ability to solve language problems, you should only count that correlation once. If a person does well on a slate of math problems, that’s evidence they’ll do well on language problems, but doing well on a second math test doesn’t count as more strong evidence they’ll do well on word problems. (That is, there are sharply diminishing returns.)
The cat is defined outside being a combination of traits of owner; that is the difference between the cat and IQ or any other psychological measure. If we were to say ‘pet’, the formula would have worked, even better if we had a purely black box qualifier into people who have bunch of traits vs people who don’t have bunch of traits, regardless of what is the cause (a pet, a cat, a weird fetish for pet related stuff).
It is however the case that narcissism does match sociopathy, to the point that difference between the two is not very well defined. Anyhow we can restate the problem and consider it a guess at the properties of the utility function, adding extra verbiage.
The analogy on the math problems is good but what we are compensating for is miscommunication, status gaming, and such, by normal people.
I would suggest, actually, not the Bayesian approach, but statistical prediction rule or trained neural network.
I would suggest, actually, not the Bayesian approach, but statistical prediction rule or trained neural network.
Given the asymptotic efficiency of the Bayes decision rule in a broad range of settings, those alternatives would give equivalent or less accurate classifications if enough training data (and computational power) were available. If this argument is not familiar, you might want to consult Chapter 2 of The Elements of Statistical Learning.
I don’t think you understood DanielVarga’s point. He’s saying that the numbers available for some of those features already have an unknown amount of the other features factored in. In other words, if you update on each feature separately, you’ll end up double-counting an unknown amount of the data. (Hopefully this explanation is reasonably accurate.)
I did understand his point. The issue is that the psychological traits are defined as what is behind the correlation, what ever this may be—brain lesion A, or brain lesion B, or weird childhood, or the like. They are very broad and are defined to include the ‘other features’
It is probably better to drop the word ‘sociopath’ and just say—selfish—but then it is not immediately apparent why e.g. arrogance not backed by achievements is predictive of selfishness, even though it very much is, as it is a case of false signal of capability.
You can eliminate the evidence that you consider double counted, for example grandiose self worth and grandiose plans, though those need to be both present because grandiose self worth without grandiose plans would just indicate some sort of miscommunication (and the self worth metric is more subjective), and are alone much poorer indicators than combined.
In any case accurate estimation of anything of this kind is very difficult. In general one just adopts a strategy such that sociopaths would not have sufficient selfish payoff for cheating it; altruism is far cheaper signal for non-selfish agents; in very simple terms if you give someone $3 for donating $4 to very well verified charity, those who value $4 in charity above $1 in pocket, will accept the deal. You just ensure that there is no selfish gain in transactions, and you’re fine; if you don’t adopt anti cheat strategy, you will be found and exploited with very high confidence as unlike the iterated prisoner dilemma, cheaters get to choose whom to play with, and get to make signals that make easily cheatable agents play with them; a bad strategy is far more likely to be exploited than any conservative estimate would suggest.
People do it selectively, though. When someone does IQ test and gets high score, you assume that person has high IQ, for instance, and don’t postulate existence of ‘low IQ people whom solved first two problems on the test’, whom would then be more likely to solve other, different problems, while having ‘low IQ’, and ultimately score high while having ‘low IQ’.
To explain the issue here in intuitive terms: let’s say we have the hypothesis that Alice owns a cat, and we start with the prior probability of a person owning a cat (let’s say 1 in 20), and then update on the evidence: she recently moved from an apartment building that doesn’t allow cats to one that does (3 times more likely if she has a cat than if she doesn’t), she regularly goes to a pet store now (7 times more likely if she has a cat than if she doesn’t), and when she goes out there’s white hair on her jacket sleeves (5 times more likely if she has a cat than if she doesn’t). Putting all of these together by Bayes’ Rule, we end up 85% confident she has a cat, but in fact we’re wrong: she has a dog. And thinking about it in retrospect, we shouldn’t have gotten 85% certainty of cat ownership. How did we get so confident in a wrong conclusion?
It’s because, while each of those likelihoods is valid in isolation, they’re not independent: there are a big chunk of people who move to pet-friendly apartments and go to pet stores regularly and have pet hair on their sleeves, and not all of them are cat owners. Those people are called pet owners in general, but even if we didn’t know that, a good Bayesian would have kept tabs on the cross-correlations and noted that the straightforward estimate would be thereby invalid.
EDITED TO ADD: So the difference between that and the IQ test example is that you don’t expect there to be an exceptional number of people who get the first two questions right and then do poorly on the rest of the test. The analogue there would be that, even though ability to solve mathematical problems correlates with ability to solve language problems, you should only count that correlation once. If a person does well on a slate of math problems, that’s evidence they’ll do well on language problems, but doing well on a second math test doesn’t count as more strong evidence they’ll do well on word problems. (That is, there are sharply diminishing returns.)
The cat is defined outside being a combination of traits of owner; that is the difference between the cat and IQ or any other psychological measure. If we were to say ‘pet’, the formula would have worked, even better if we had a purely black box qualifier into people who have bunch of traits vs people who don’t have bunch of traits, regardless of what is the cause (a pet, a cat, a weird fetish for pet related stuff).
It is however the case that narcissism does match sociopathy, to the point that difference between the two is not very well defined. Anyhow we can restate the problem and consider it a guess at the properties of the utility function, adding extra verbiage.
The analogy on the math problems is good but what we are compensating for is miscommunication, status gaming, and such, by normal people.
I would suggest, actually, not the Bayesian approach, but statistical prediction rule or trained neural network.
Given the asymptotic efficiency of the Bayes decision rule in a broad range of settings, those alternatives would give equivalent or less accurate classifications if enough training data (and computational power) were available. If this argument is not familiar, you might want to consult Chapter 2 of The Elements of Statistical Learning.
I don’t think you understood DanielVarga’s point. He’s saying that the numbers available for some of those features already have an unknown amount of the other features factored in. In other words, if you update on each feature separately, you’ll end up double-counting an unknown amount of the data. (Hopefully this explanation is reasonably accurate.)
http://en.wikipedia.org/wiki/Conditional_independence
I did understand his point. The issue is that the psychological traits are defined as what is behind the correlation, what ever this may be—brain lesion A, or brain lesion B, or weird childhood, or the like. They are very broad and are defined to include the ‘other features’
It is probably better to drop the word ‘sociopath’ and just say—selfish—but then it is not immediately apparent why e.g. arrogance not backed by achievements is predictive of selfishness, even though it very much is, as it is a case of false signal of capability.
I don’t think it matters how it is defined… One still shouldn’t double count the evidence.
You can eliminate the evidence that you consider double counted, for example grandiose self worth and grandiose plans, though those need to be both present because grandiose self worth without grandiose plans would just indicate some sort of miscommunication (and the self worth metric is more subjective), and are alone much poorer indicators than combined.
In any case accurate estimation of anything of this kind is very difficult. In general one just adopts a strategy such that sociopaths would not have sufficient selfish payoff for cheating it; altruism is far cheaper signal for non-selfish agents; in very simple terms if you give someone $3 for donating $4 to very well verified charity, those who value $4 in charity above $1 in pocket, will accept the deal. You just ensure that there is no selfish gain in transactions, and you’re fine; if you don’t adopt anti cheat strategy, you will be found and exploited with very high confidence as unlike the iterated prisoner dilemma, cheaters get to choose whom to play with, and get to make signals that make easily cheatable agents play with them; a bad strategy is far more likely to be exploited than any conservative estimate would suggest.