Why humans suck: Ratings of personality conditioned on looks, profile, and reported match
The recent OKCupid blog, which gwern mentioned in Media Open Thread, investigated the impact of three different factors on users’ perceptions of each other: authority (reported match %), profile text (present or absent), and looks.
On the bright side, the authority versus reality match-up came out tied:
If you don’t consider that a good outcome, you’re not yet sufficiently cynical.
If a picture is worth a thousand words, what are a thousand words worth?
… Approximately nothing.
And the winner is...
OKC is a dating website, and some people are only looking for casual sex, so it does not surprise me that looks are more important than text, nor is this a bad thing. After all, whatever you write is going to attract some people and repel others, which largely cancels out, while beauty is more objective and will attract/repel everyone to a highly correlated degree. If you compared individual ratings rather than collapsing across all visitors (which seems to be what they did) than words would have a far greater effect.
Also, people looking for sex will (I imagine) rate many profiles and play a numbers game, whereas those looking for love spend far more time on individual messages. In the time it takes to asses one persons personality you can judge dozens of peoples looks. This means that there is a sampling bias in ratings, weighted towards people who judge based on looks.
Ahem. (See also: this, search for “mean and variance”.)
It is true that members of subcultures, such as goth girls, will have a far higher variance in perceived attractiveness. However, this is a fairly small percentage of the population. Furthermore, aspects such as facial symmetry are universally perceived as attractive, so even if you don’t like goth girls you can probably judge if one is more attractive than another.
Most of the example in the OKCupid article weren’t Goth girls. One was a girl who put a flower into her hair. The next girl has a round face.
More generally there are guys who find a 1,80 meter tall girl more attractive than one that’s 1,60 as nowadays models are usually very tall and set the ideal of beauty for some people. Other guys prefer smaller woman.
A lot of my own cultural conditioning doesn’t come from watching TV but from dancing Salsa. As a result I think I will judge muscle tonus as more important than the average guy.
Different guys also care differently about factors such as weight or skin color.
The issue isn’t whether looks are objective (clearly they aren’t,) but whether judgments of looks are more correlated among the userbase than those of personality.
(Actually, the degree to which personality is correlated is probably the more interesting question here (granting that interestingness isn’t particularly objective either.) Robin Hanson has pointed to some studies that suggest that “compatibility” isn’t really a thing and some people are just easier to get along with than others—the study in question IIRC didn’t take selection effects into account, but it remains an interesting hypothesis.)
It’s not really clear that judging people via looks is bad.
If I see that a person has a huge tattoo that allows me to infer something about their personality. People choose their own photos to make statements about themselves.
And in fact, I seem to recall OkCupid doing another informal study a couple of years ago on which profile pictures were the best at getting replies and messages; and finding out that these were not the ones which explicitly showed the person’s face and physique, but the ones which showed the person engaged in a cool activity (skiing, bunjee jumping, swimming etc)
The positive side of this is that these are people who are willing to turn over some control in their mate selection process to an algorithm that considers self-reported interest and personality traits.
Given that well-calibrated algorithms are more effective than people in the vast majority of scenarios, I find it hard to fault the people willing to use them.
Given? Which “vast majority of scenarios” are we talking about and where do you get “well-calibrated algorithms” for them?
Deciding how to invest in the stock market. The algorithm for doing better than nearly all active investors is trivial.
Balancing a checkbook and paying bills on time. Software will never ‘forget about’ a bill.
Drum machines. Most self professed drummers can be replaced by a drum machine, and it’ll be able to properly manage 7⁄4 timing and won’t speed up or slow down based on the amount of beer in it’s system.
Calendar software. It never forgets or confuses a date or birthday.
Grammar and spell checkers. Obvious.
Writing legible text to paper. Everyone can do it. Nobody does it as good as an HP laserjet.
Department store layout and pricing software.
Inventory management software.
Given the above, the idea that mate selection software can do better than most individuals seems trivially obvious.
These examples show that there are plenty of situations in which well-calibrated algorithms can do better than many people. For me, saying that they can do better than people is a stronger statement than that; it means that they algorithms can do better than all people, or at least better than almost all people. In which case:
Investing. If you have an algorithm that will do better than all people, Warren Buffett and Renaissance Technologies would like a word with you. (I expect a lot of Renaissance Technologies’ decisions are made by computers, but I bet it would be misleading at best to describe what they do overall as just following an algorithm.)
Balancing a checkbook. Well, yes, no doubt computers are better at that. I’ll also grant you calendar software, maybe spell-checking, and inventory management. Computers are good at boring bookkeeping tasks, and no one claims otherwise.
Drumming. A drum machine is a very useful thing, but there’s absolutely no way that a drum machine is an acceptable replacement for a really good human drummer.
Grammar. Any competent writer is much better at this than any computer. (More likely to make isolated one-off screwups, but that’s not really what matters most.)
Writing. Calligraphy is a thing, and it’s not a thing computers are good at.
Store layout and pricing. I don’t know about this. I wouldn’t be astonished if computers are better than the best humans at store layout, but I wouldn’t be astonished the other way around either. I bet humans are better at pricing, but I’m prepared to be convinced otherwise.
… is exactly not the sort of task computers are best at: boring bookkeeping where the goal is to perform a clearly-defined task according to clearly-defined standards.
That’s just being pedantic. I did clearly state that “well-calibrated algorithms are more effective than people in the vast majority of scenarios”, and the vast majority of scenarios include average people, not exceptional people.
Investing: invest in index funds.
Drumming: while a really good human drummer gives you a lot more flexibility, the vast majority of human drummers are, quite frankly, crap.
Grammar: the vast majority of people who write are not competent. I am daily exposed to average individuals who are incapable of writing clear, concise, and proofread sentences.
Writing: calligraphy is also something done by a ridiculously small percentage of the population. The vast majority of the population are able to communicate via handwriting only because of the redudancies, error correction, and context of the language in which they are writing. Ask your friends to hand write a string consisting of 64 characters in base-64 encoding; now enter that string into a terminal and check it against the original. Of the hundred people you know the best, how many successes do you think you would have?
Store layout and pricing: of course humans are better. That’s why pretty much all big supermarket chains use optimizing software, lots of data collection, A/B experimentation, and are able to operate on razor thin profit margins of half a percent or less. I’d totally trust humans with that.
Mate selection: contrary to your assertion, mate selection -is- one of those boring bookkeeping tasks computers will be better at. Mate selection has a pretty small number of constraints really—relationship goals, duration, children, etc. It’s also a problem that requires crunching through huge quantities of data, running correlations, and ferreting out what people are actually looking for instead of what they say they’re looking for. It’s not really any different than netflix selecting movies I might like based on my viewing history and habits. Twenty years ago, you’d have been similarly arguing that movie selection is “not the sort of task computers are best at”.
Sure, a good relationship therapist with loads of experience and resources could probably do a better job than the algorithms we have today—but that is not the vast majority of the population. You need look only at relationship failure and divorce rates of today to realize how godawfully bad people are at mate selection.
I can’t help thinking of this XKCD cartoon.
It’s too bad they don’t have a “profile without picture” study to see if not being able to judge by sight influenced the rating/messaging.
They’ve previously published that first-messaging rates are drastically lower for profiles without pictures. IIRC even pretty bad pictures are better than no pictures at all. That would be a huge confounder.
There is something on that in that blog. It’s a little more complicated. They turned off pictures, and studied the rate of message responses, and the satisfaction users had with the resulting dates. They concluded that users strongly preferred to message good-looking users, but that preference for looks was not borne out in their satisfaction with dates.
It looks as if it might be worth to manually disable pictures (e.g. with HTTP switchboard) and browse profiles only seeing text.
Now I’m interested in the steepness of that line, and by the fact personality scores seem to be lower than “looks” score. Also, are universities using OkCupid as a resource in their studies? I know 1 university has famously used facebook, but OkCupid seems much more open and amenable to this kind of thing
That says to me that the variance in people’s estimates of personality is higher than the variance in their estimates of looks (although it’s modulated by looks), which doesn’t sound too unreasonable. It still centers around 3, though, so the average is probably about the same.
I’m surprised I don’t see more discontinuity around 4 on either axis, which marked (when I last used OKCupid) the system’s only significant threshold: a rating of 4 or higher delivered a vague message about having an admirer, and mutual ratings of 4 or higher meant that the system dropped the coy act and just told you who liked you. Maybe they changed that before collecting this data.