One key factor in metrics is how the number relates to the meaning. We’d prefer metrics that have scales which are meaningful to the users, not arbitrary. I really liked one example I saw recently.
In discussing this point in a paper entitled “Arbitrary metrics in psychology,” Blanton and Jaccard (doi:10.1037/0003-066X.61.1.27) fist point out that likert scales are not so useful. They then discuss the the (in)famous IAT test, where the scale is a direct measurement of the quantity of interest, but note that: “The metric of milliseconds, however, is arbitrary when it is used to measure the magnitude of an attitudinal preference.” Therefore, when thinking about degree of racial bias, “researchers and practitioners should refrain from making such diagnoses until the metric of the IAT can be made less arbitrary and until a compelling empirical case can be made for the diagnostic criteria used.” They go on to discuss norming measures, and looking at variance—but the base measure being used in not meaningful, so any transformation is of dubious value.
Going beyond that paper, looking at the broader literature on biases, we can come up with harder to measure but more meaningful measures of bias. Using probability of hiring someone based on racially-coded names might be a more meaningful indicator—but probability is also not a clear indicator, and use of names as a proxy obscures some key details about whether the measurement is class-based versus racial. It’s also not as clear how big of an effect a difference in probability makes, despite being directly meaningful.
A very directly meaningful measure of bias that is even easier to interpret is dollars. This is immediately meaningful; if a person pays a different amount for identical service, that is a meaningful indicator of not only the existence, but the magnitude of a bias. Of course, evidence of pay differentials is a very indirect and complex question, but there are better ways of getting the same information in less problematic contexts. Evidence can still be direct, such as how much someone bids for watches, where pictures were taken with the watch on a black or white person’s wrist, are a much more direct and useful way to understand how much bias is being displayed.
One key factor in metrics is how the number relates to the meaning. We’d prefer metrics that have scales which are meaningful to the users, not arbitrary. I really liked one example I saw recently.
In discussing this point in a paper entitled “Arbitrary metrics in psychology,” Blanton and Jaccard (doi:10.1037/0003-066X.61.1.27) fist point out that likert scales are not so useful. They then discuss the the (in)famous IAT test, where the scale is a direct measurement of the quantity of interest, but note that: “The metric of milliseconds, however, is arbitrary when it is used to measure the magnitude of an attitudinal preference.” Therefore, when thinking about degree of racial bias, “researchers and practitioners should refrain from making such diagnoses until the metric of the IAT can be made less arbitrary and until a compelling empirical case can be made for the diagnostic criteria used.” They go on to discuss norming measures, and looking at variance—but the base measure being used in not meaningful, so any transformation is of dubious value.
Going beyond that paper, looking at the broader literature on biases, we can come up with harder to measure but more meaningful measures of bias. Using probability of hiring someone based on racially-coded names might be a more meaningful indicator—but probability is also not a clear indicator, and use of names as a proxy obscures some key details about whether the measurement is class-based versus racial. It’s also not as clear how big of an effect a difference in probability makes, despite being directly meaningful.
A very directly meaningful measure of bias that is even easier to interpret is dollars. This is immediately meaningful; if a person pays a different amount for identical service, that is a meaningful indicator of not only the existence, but the magnitude of a bias. Of course, evidence of pay differentials is a very indirect and complex question, but there are better ways of getting the same information in less problematic contexts. Evidence can still be direct, such as how much someone bids for watches, where pictures were taken with the watch on a black or white person’s wrist, are a much more direct and useful way to understand how much bias is being displayed.
See also: https://twitter.com/JessieSunPsych/status/1333086463232258049
Oh man, I wish you’d come in under the deadline.
For people who don’t feel like clicking: it’s a quantification of behavior predicted by different scores on Big-5.