Applied Bayes’ Theorem: Reading People

Kaj_Sotala30 Jun 2010 17:21 UTC

37 points

Or, how to recognize Bayes’ theorem when you meet one making small talk at a cocktail party.

Knowing the theory of rationality is good, but it is of little use unless we know how to apply it. Unfortunately, humans tend to be poor at applying raw theory, instead needing several examples before it becomes instinctive. I found some very useful examples in the book Reading People: How to Understand People and Predict Their Behavior—Anytime, Anyplace. While I didn’t think that it communicated the skill of actually reading people very well, I did notice that it did have one chapter (titled “Discovering Patterns: Learning to See the Forest, Not Just the Trees”) that could almost have been a collection of Less Wrong posts. It also serves as an excellent example of applying Bayes’ theorem in every-day life.

In “What is Bayesianism?” I said that the first core tenet of Bayesianism is “Any given observation has many different possible causes”. Reading People says:

If this book could deliver but one message, it would be that to read people effectively you must gather enough information about them to establish a consistent pattern. Without that pattern, your conclusions will be about as reliable as a tarot card reading.

In fact, the author is saying that Bayes’ theorem applies when you’re trying to read people (if this is not immediately obvious, just keep reading). Any particular piece of evidence about a person could have various causes. For example, in a later chapter we are offered a list of possible reasons for why someone may have dressed inappropriately for an occasion. They might (1) be seeking attention, (2) lack common sense, (3) be self-centered and insensitive to others, (4) be trying to show that they are spontaneous, rebellious, or noncomformists and don’t care what other people think, (5) not have been taught how to dress and act appropriately, (6) be trying to imitate someone they admire, (7) value comfort and convenience over all else, or (8) simply not have the right attire for the occasion.

Similarly, very short hair on a man might indicate that he (1) is in the military, or was at some point in his life, (2) works for an organization that demands very short hair, such as a police force or fire department, (3) is trendy, artistic or rebellious, (4) is conservative, (5) is undergoing or recovering from a medical treatment, (6) thinks he looks better with short hair, (7) plays sports, or (8) keeps his hair short for practical reasons.

So much for reading people being easy. This, again, is the essence of Bayes’ theorem: even though somebody being in the military might almost certainly mean that they’d have short hair, them having a short hair does not necessarily mean that they are in the military. On the other hand, if someone has short hair, is clearly knowledgeable about weapons and tactics, displays a no-nonsense attitude, is in good shape, and has a very Spartan home… well, though it’s still not for certain, it seems likely to me that of all the people having all of these attributes, quite a few of them are in the military or in similar occupations.

The book offers a seven-step guide for finding patterns in people. I’ll go through them one at a time, pointing out what they say in Bayesian and heuristic/bias terms. Note that this is not a definitive list: if you can come up with more Bayesian angles to the book, post them in the comments.

1. Start with the person’s most striking traits, and as you gather more information see if his other traits are consistent or inconsistent.

As computationally bounded agents, we can’t simply take in all the available data at once: we have to start off some particularly striking traits and start building a picture from there. However, humans are notorious about anchoring too much (Anchoring and Adjustment), so we are reminded to actively seek disconfirmation to any initial theory we have.

I constantly test additional information agaisnt my first impression, always watching for patterns to develop. Each piece of the puzzle—a person’s appearance, her tone of voice, hygiene and so on—may validate my first impression, disprove it, or have little impact on it. If most of the new information points in a different direction than my first impression did, I revise that impression. Then I consider whether my revised impression holds up as even more clues are revealed—and revise it again, if need be.

Here, the author is keeping in mind Conservation of Expected Evidence. If you could anticipate in advance the direction of any update, you should just update now. You should not expect to be able to get the right answer right away and never need to seriously update it. Nor should you expect to suddenly counter some piece of evidence that, on its own, would make you switch to becoming confident in something completely different. An ideal Bayesian agent will expect their beliefs to be in a constant state of gradual revision as the evidence comes in, and people with human cognitive architectures should also make an explicit effort to make their impressions update as fluidly as possible.

Another thing that’s said about first impressions also bears to be noted:

People often try hard to make a good first impression. The challenge is to continue to examine your first impression of someone with an open mind as you have more time, information, and opportunity.

Filtered evidence, in its original formulation, was a set of evidence that had been chosen for the specific purpose of persuading you of something. Here I am widening the definition somewhat, and also applying to cases where the other person cannot exclude all the evidence they dislike, but are regardless capable of biasing it in a direction of their choice. The evidence presented at a first meeting is usually filtered evidence. (Such situations are actually complicated signaling games, and a full Bayesian analysis would take into account all the broader game-theoretic implications. Filtered evidence is just one part of it.)

Evidence is an event tangled by links of cause and effect with whatever you want to know about. On a first meeting, a person might be doing their best to appear friendly, say. Usually being a friendly person will lead them to behave in specific ways which are characteristic of friendly people. But if they are seeking to convey a good impression of themselves, their behavior may not be caused by an inherent friendliness anymore. The behavior is not tangled with friendliness, but with a desire to appear friendly.

2. Consider each characteristic in light of the circumstances, not in isolation.

The second core tenet in What is Bayesianism was “How we interpret any event, and the new information we get from anything, depends on information we already had.”

If you told me simply that a young man wears a large hoop earring, you couldn’t expect me to tell you what that entails. It might make a great parlor game, but in real life I would never hazard a guess based on so little information. If the man is from a culture in which most young men wear large earrings, it might mean that he’s a conformist. If, on the other hand, he is the son of a Philadelphia lawyer, he may be rebellious. If he plays in a rock band, he may be trendy.

A Bayesian translation of this might read roughly as follows. “Suppose you told me simply that a young man wears a large hoop earring. You are asking me to suggest some personality trait that’s causing him to wear them, but there is not enough evidence to locate a hypothesis. If we knew that the man is from a culture where most young men wear large earrings, we might know that conformists would be even more likely to wear earrings. If the number of conformists was sufficiently large, then a young man from that culture, chosen randomly on the basis of wearing earrings, might very likely be a conformist, simply because conformist earring-wearers make up such a large part of the earring-wearer population.

(Or to say that in a more mathy way, say we started with a .4 chance of a young man being a conformist, a .6 chance for a young man to be wearing earrings, and a .9 chance for the conformists to be wearing earrings. Then we’d calculate (0.9 * 0.4) / (0.6) and get a 0.6 chance for the man in question to be conformist. We don’t have exact numbers like these in our heads, of course, but we do have a rough idea.)

But then, he might also be the son of a Philadelphia lawyer, say, and then we’d get a good chance for him being rebellious. Or if he were a rock band member, he might be trendy. We don’t know which of these reference classes we should use; whether we should think we’re picking a young man at random from a group of earring-wearing young men from an earring-wearing culture or from all the sons of lawyers. We could try to take a prior over his membership in any of the relevant reference classes, saying for instance that there was a .05 chance of him being a member of an earring culture, or a .004 chance of him being the son of a lawyer and so on. In other words, we’d think that we’re picking a young earring-wearing man from the group of all earring-wearing men on Earth. Then we’d have a (0.05 * 0.6 =) 0.03 chance of him being a conformist due to being from an earring culture, et cetera. But then we’d distribute our probability mass over such a large amount of hypotheses that they’d all be very unlikely: the group of all earring-wearing men is so big that drawing at random could produce pretty much any personality trait. Figuring out the most likely alternative of all those countless alternatives might make a great parlor game, but in real life it’d be nothing you’d like to bet on.

If you told me that he was also carrying an electric guitar… well, that still wouldn’t be enough to get a very high probability on any of those alternatives, but it sure would help increase the initial probability of the “plays in a rock band” hypothesis. Of course, he could play in a rock band and be from a culture where people usually wore earrings.”

3. Look for extremes. The importance of a trait or characteristic may be a matter of degree.

This is basically just a reformulation of the above points, with an emphasis on the fact that extreme traits are easier to notice. But again, extreme signs don’t tell us much in isolation, so we need to look for the broader pattern.

The significance of any trait, however extreme, usually will not become clear until you learn enough about someone to see a pattern develop. As you look for the pattern, give special attention to any other traits consistent with the most extreme ones. They’re usually like a beacon in the night, leading you in the right direction.

4. Identify deviations from the pattern.

(I’ll skip this one.)

5. Ask yourself if what you’re seeing reflects a temporary state of mind or a permanent quality.

Again, any given observation has many different possible causes. Sometimes a behavior is caused not by any particular personality trait, but the person simply happening to be in a particular mood, which might be rare for them.

This is possibly old hat by now, but just to be sure: The probability that behavior X is caused by cause A, sayeth Bayes’ theorem, is the probability that A happened in the first place times (since they must both be true) the probability that A would cause X at all. That’s divided by the summed chance for anything else to have caused X.

A psedo-frequentist interpretation might compare this to the probability of drawing an ace out of a deck of cards. (I’m not sure if the following analogy is useful or makes sense to anyone besides me, but let’s give it a shot.) Suppose you get to draw cards from a deck, but even after drawing them you’re never allowed to look at them, and can only guess whether you’re holding the most valuable ones. The chance that you’ll draw a particular card is one divided by the total number of cards. You’d have a better chance of drawing it if you got to draw more cards. Imagine the probability of “(A happened) * (A would cause X)” as the amount of cards you’ll get to draw from the deck of all hypotheses. You need to divide that with the probability that all hypotheses combined have, alternative explanations included, so think of the probability of the alternate hypotheses as the amount of other cards in the deck. Then your chance of drawing an ace of hearts (the correct hypothesis) is maximized if you get to draw as many cards as possible and the alternative hypotheses have as little probability (as few non-ace-of-hearts cards in the deck) as possible. Not considering the alternate hypotheses is like thinking you’ll have a high chance of drawing the correct card, when you don’t know how many cards there are in the deck total.

If you’re hoping to draw the correct hypothesis about the reasons for someone’s behavior, then consider carefully whether you want to use the “this is a permanent quality” or the “this is just a transient mood” explanation. Frequently, drawing the “this is just a transient mood” cards will give you a better shot at grabbing the hypothesis with the most valuable card.