Measuring open-mindedness
Recently we have opened an experimental website for Rational Discussion of Politics. A special feature of the new website is an automated recommendation system which studies user preferences based on their voting records. The purpose of this feature is to enhance the quality of discussion without using any form of censorship.
The recommendation system was previously tested with the help of 30 members of a political discussion forum. The tests have shown that most user preferences can be reasonably well described by just two parameters. The system chooses the parameters (principal vectors) independently based only on the numerical data (comment ratings), but it was easy to see that one vector corresponded to the “leftwing—rightwing” and another to the “well written – poorly written” axis.
About a month ago we started discussions on the new website. This time, all our participants were LW members and the results were very different. There was relatively little variation along “well written – poorly written” axis. There was significant variation along what seemed to be the political views axis, but it could no longer be perfectly described by the conventional “leftwing—rightwing” labels. For the moment, we adopted “populares” and “optimates” terms for the two camps (the former seems somewhat correlated with “left-wing/liberal” and the latter with “right-wing/libertarian”).
The results have shown an interesting asymmetry between the camps. In the previous tests, both left and right leaning users upvoted users from their own camp much more frequently. However, one group was several times more likely to upvote their opponents than the other. Among “populares” and “optimates” the asymmetry was a lot weaker (currently 27%), but still noticeable.
In both cases our sample sizes were small and may not be representative of the LW community or the US population. Still, it would be interesting to find an explanation for this asymmetry. One possibility is that, on average, one side presents significantly better arguments. Another possibility is that the other group is more open-minded.
Can anyone suggest a test that can objectively decide which (if any) hypothesis is correct?
I hate to ask, but have you controlled for the number of posts on both sides of the axis? If one side outnumbers the other, that could explain the divergences.
Sure.
The system assigns “left-wing” and “right-wing” (“populare” and “optimate”) labels by comparing user’s preferences to the average preferences of all users, so both sides are nearly equal. In any case, the 27% difference was in the proportions of positive votes, not in the absolute numbers of upvotes.
Let’s say 25% of your users are inherently “optimate”, 50% are inherently “populare”, and 25% aren’t really either.
Would your algorithm sort the people who don’t strongly agree with either side with the “optimates”, since their preferences are closer to the “optimate” group than the “populare” group? And would that produce the effect you’re seeing, since half the “optimate” group are upvoting more or less equally?
In principle, this is possible. The system assigns each user a number corresponding to his/her position on the “left-right” (“populare-optimate”) axis. If, based on their votes, 25% of users are assigned “-10”, 50% are assigned “10” and 25% are assigned “0”, then the average is “2.5” which would make those with “0” into “left-wingers”.
At least in our first group (where the effect was the strongest and the distribution was pretty close to Gaussian) this is not what had happened.
Is it correct to say that you’re basing your assignment of each user into the two categories based on the same variable you’re analyzing—the distribution (or more specifically the clustering) of the votes? (My reading suggests the system is producing the vectors you’re noticing based on clustering, and then you’re naming the vectors?)
Yes.
Okay, why have you elevated the hypothesis of open-mindedness?
Without looking at the data, I couldn’t say with certainty what the dominant cause is, but I can reasonably confidently say that your clustering algorithm, with its built-in assumption of a roughly even divide on both sides of its vectors, is responsible for at least part of it.
The prime issue is that you are algorithmically creating the data—the clusters—you’re drawing inferences on. Your algorithm should be your most likely candidate for -any- anomalies. You definitely shouldn’t get attached to any conclusions, especially if they’re favorable to the group of people you more closely identify with. (It’s my impression that the “open-mindedness” conclusion -is- favorable to the people you identify with, given that you give it higher elevation than the possibility that the opposing side is producing better arguments.)
Suppose people are divided by some arbitrary criteria (e.g., blondes vs. brunettes) and then it turns out that blondes upvote brunettes much more often than vice versa. You could still ask the same question.
Regarding elevation, I simply wanted a short and easy to understand title and it did not occur to me that it would be perceived as prejudicial.
Except in this case you’re grouping on the same behavior you’re measuring—given that you’re doing statistical analysis on what is essentially traffic-analysis grouped data, I can’t think of a trivial example to compare to. That’s bound to lead to some variable dependency issues.
And I think you did realize that, given your care in not naming names or sides, but I’m not attacking you, I’m suggesting you should be cautious in taking conclusions. You want to measure—so you’re not taking it as a given, which is good skepticism—but you skipped skepticism of your techniques.
Suppose, for the sake of the argument, that my own data is totally wrong and consider the same question for a purely hypothetical case:
Group A upvotes only its own comments. Group B upvotes preferentially its own comments. Is there a way to tell whether the difference lies in the comment quality or the characters of the group members?
I’d say your hypothetical case is undecidable on multiple levels, starting with how to determine comment quality in the first place, the very definition of which may vary between Group A and Group B.
If you measure the personality via a big 5 personality test you can see whether the ratings correlate.
I’m sure there will be some correlations but I would not know what to do with them. Traits like conscientiousness have no obvious connection to my question. Openness to new experiences is sometimes used as a proxy for open-mindedness, but to me this seems a little farfetched. Is there a strong reason to believe that an adventurous eater will be more open-minded on political questions?
Turn the problem on its head. Pause in the effort to optimize positive outcomes and devote resources to minimizing negative oUT comes. Find what has failed the most and address that. What has succeeded has succeeded.
You shouldn’t make cross-participant comparisons based on a small non-randomized data set. Stick to within-subject comparisons if you want to analyze your data or look at other studies that have already analyzed your questions.
The answer, by the way, is that conservatives and liberals score about equally on political knowledge and intelligence but liberals are significantly more open-minded. I wouldn’t at all be surprised, however, if you’re getting conservatives upvoting liberals more which is likely a selection effect.
First you need to get precise about what you are testing—in particular, define “significantly better arguments”.
I don’t know the details of your statistics, but is it possible that the way you are choosing “principal vectors” is entangled with how the resulting clusters rate?
The word “better” may be replaced with “more coherent” or even “more grammatically correct”. Fundamentally, the question is whether the difference in ratings arises from the difference in the comment qualities (other than political orientation) or from the difference in those who rate them.
The system chooses vectors automatically. But I think the above question would still be valid even if people were divided in two groups in some totally arbitrary way.
“Noticeable” is not a word that usually appears in statistics. Science papers rather speak about whether or not effects are statistically relevant.
Another question would be whether the vectors you have found are robust. Do they change when you drop a few users, or do they stay the same. If they change, than it’s not clear that you have found a reliable category.
Reading things into data that aren’t there happens quite often in statistics and I’m not sure that in this case there’s enough data to draw strong conclusions.
As I’ve written above, the two groups may not be representative of the LW community or the US population. But within each group the differences were statistically significant, so the question about their origin would be valid in any case.
If the significant means statistical significance, then what’s the p-value?
In the “optimate” vs “populare” case, the difference was significant at about 2.5 sigmas. I don’t remember the exact values in the “left” vs “right” case, but it was over 10 sigmas.