Why in the world does this make it worthless? They’ve discovered an effect above and beyond any possible aesthetic trends in the assorted races; if they hadn’t included that control, the data could be written off with “well, maybe white men are just all so ridiculously attractive, of course people write them back”.
If you want to predict someone’s actual response rate, the model is fine—you can get an accurate prediction by plugging in that person’s information into the model, provided they’re in the attractiveness stratum of the data pool used to fit the model. But for OkCupid’s claimed causal inference, we’re interested in counterfactual inquiries like, “What sort of response rate would black woman X get if she were white?” But we can’t twiddle just the race variable in the model to get the answer: if X were white, her attractiveness rating would be different too, and she might end up in a different stratum than the one addressed by the model. Since there are no results for that counterfactual stratum, the model cannot address the counterfactual, i.e., it can’t be used to make causal claims.
The problem is that if you’re trying to detect racism, the variables you’re controlling for had better be independent of racism, which in this case they obviously aren’t. Actual racism could go either way and still be consistent with their findings. At the very least I’d like to see the data before and after applying the control; this would give us more information, but probably wouldn’t let the OkCupid team make sweeping generalizations about me like “White guys are shitty” (actual quote).
Attractiveness is wildly subjective. If people find the features of minority races to be less attractive than the features of white people, that just is a type of racism. Any possible objective standard of physical attractiveness (candidates include symmetry, youth, koinophilia relative to the world population, health) would have only the weakest possible correlation with race.
Nearby in this thread, gwern gave an example of how an attractiveness rating can be influenced by factors other than actual subjective attractiveness to the rater… and how those factors can be related to, yes, race. If after reading his comment you’ll still think you know an unambiguous way to interpret the post-control data that OkCupid published, I’d really like to hear it. To me the whole situation looks more like a trainwreck.
They also ignore massive selection effects. Most significant, and obvious, is the fact that most OKCupid users are white. If I were a black or Asian person interested only in other people of my race, it would be much wiser for me to find a site dedicated to finding people of my race, which exist. If I’m white and I only want to date white people, there are so many that I’ll be just fine.
Why in the world does this make it worthless? They’ve discovered an effect above and beyond any possible aesthetic trends in the assorted races; if they hadn’t included that control, the data could be written off with “well, maybe white men are just all so ridiculously attractive, of course people write them back”.
If you want to predict someone’s actual response rate, the model is fine—you can get an accurate prediction by plugging in that person’s information into the model, provided they’re in the attractiveness stratum of the data pool used to fit the model. But for OkCupid’s claimed causal inference, we’re interested in counterfactual inquiries like, “What sort of response rate would black woman X get if she were white?” But we can’t twiddle just the race variable in the model to get the answer: if X were white, her attractiveness rating would be different too, and she might end up in a different stratum than the one addressed by the model. Since there are no results for that counterfactual stratum, the model cannot address the counterfactual, i.e., it can’t be used to make causal claims.
ETA: Andrew Gelman calls this the fallacy of controlling for an intermediate outcome.
Thanks for the link, that description applies to OkCupid’s analysis perfectly.
The problem is that if you’re trying to detect racism, the variables you’re controlling for had better be independent of racism, which in this case they obviously aren’t. Actual racism could go either way and still be consistent with their findings. At the very least I’d like to see the data before and after applying the control; this would give us more information, but probably wouldn’t let the OkCupid team make sweeping generalizations about me like “White guys are shitty” (actual quote).
Attractiveness is wildly subjective. If people find the features of minority races to be less attractive than the features of white people, that just is a type of racism. Any possible objective standard of physical attractiveness (candidates include symmetry, youth, koinophilia relative to the world population, health) would have only the weakest possible correlation with race.
Nearby in this thread, gwern gave an example of how an attractiveness rating can be influenced by factors other than actual subjective attractiveness to the rater… and how those factors can be related to, yes, race. If after reading his comment you’ll still think you know an unambiguous way to interpret the post-control data that OkCupid published, I’d really like to hear it. To me the whole situation looks more like a trainwreck.
They also ignore massive selection effects. Most significant, and obvious, is the fact that most OKCupid users are white. If I were a black or Asian person interested only in other people of my race, it would be much wiser for me to find a site dedicated to finding people of my race, which exist. If I’m white and I only want to date white people, there are so many that I’ll be just fine.