Meta: social influence bias and the karma system
Given LW’s keen interest in bias, it would seem pertinent to be aware of the biases engendered by the karma system. Note: I used to be strictly opposed to comment scoring mechanisms, but witnessing the general effectiveness in which LWers use karma has largely redeemed the system for me.
In “Social Influence Bias: A Randomized Experiment” by Muchnik et al, random comments on a “social news aggregation Web site” were up-voted after being posted. The likelihood of such rigged comments receiving additional up-votes were quantified in comparison to a control group. The results show that users were significantly biased towards the randomly up-voted posts:
The up-vote treatment significantly increased the probability of up-voting by the first viewer by 32% over the control group … Uptreated comments were not down-voted significantly more or less frequently than the control group, so users did not tend to correct the upward manipulation. In the absence of a correction, positive herding accumulated over time.
At the end of their five month testing period, the comments that had artificially received an up-vote had an average rating 25% higher than the control group. Interestingly, the severity of the bias was largely dependent on the topic of discussion:
We found significant positive herding effects for comment ratings in “politics,” “culture and society,” and “business,” but no detectable herding behavior for comments in “economics,” “IT,” “fun,” and “general news”.
The herding behavior outlined in the paper seems rather intuitive to me. If before I read a post, I see a little green ‘1’ next to it, I’m probably going to read the post in a better light than if I hadn’t seen that little green ‘1’ next to it. Similarly, if I see a post that has a negative score, I’ll probably see flaws in it much more readily. One might say that this is the point of the rating system, as it allows the group as a whole to evaluate the content. However, I’m still unsettled by just how easily popular opinion was swayed in the experiment.
This certainly doesn’t necessitate that we reprogram the site and eschew the karma system. Moreover, understanding the biases inherent in such a system will allow us to use it much more effectively. Discussion on how this bias affects LW in particular would be welcomed. Here are some questions to begin with:
Should we worry about this bias at all? Are its effects negligible in the scheme of things?
How does the culture of LW contribute to this herding behavior? Is it positive or negative?
If there are damages, how can we mitigate them?
Notes:
In the paper, they mentioned that comments were not sorted by popularity, therefore “mitigating the selection bias.” This of course implies that the bias would be more severe on forums where comments are sorted by popularity, such as this one.
For those interested, another enlightening paper is “Overcoming the J-shaped distribution of product reviews” by Nan Hu et al, which discusses rating biases on websites such as amazon. User gwern has also recommended a longer 2007 paper by the same authors which the one above is based upon: “Why do Online Product Reviews have a J-shaped Distribution? Overcoming Biases in Online Word-of-Mouth Communication”
- 18 Sep 2014 6:53 UTC; 9 points) 's comment on Link: How Community Feedback Shapes User Behavior by (
- 6 May 2015 21:31 UTC; 4 points) 's comment on Guidelines for Upvoting and Downvoting? by (
Huh. That sounds exactly like the results from my little LW experiment: http://www.gwern.net/Anchoring
Let’s say I read a post that’s math heavy. I don’t understand all the math but otherwise the post seems great.
Do I upvote the post?
If the post has 100% upvotes with 3 total votes I will probably upvote. If the post has 100% downvotes with 3 total votes I won’t upvote because it probably means that there’s an error in the math that I don’t see because my math background isn’t as strong as the math background of some other people on LW.
Did the upvotes or downvotes biased me in a negative way just because I changed my behavior? I don’t think so. It provided meaningful information on which I made my decision.
But the point of voting is for you to be a provider of information, not to be a consumer of information. If your vote is simply reflecting the information content already available from the other votes, what have you added? Put in LW terms, your vote should entangled with information unique to you. If the only information it is entangled with is other votes, then you’re just perpetuating an information cascade. This isn’t Hollywood Squares. The point of voting isn’t to “win”. It’s not to pat yourself on the back for upvoting useful posts. It’s to provide people with useful information about whether the post is useful. When you’re deciding whether to upvote, you shouldn’t be asking “Do I think this post is useful?”, but “Is this post, given the current vote total more likely to be useful than other posts with the same post total?”
In that case I don’t have added any information but I also didn’t put any wrong information into the system. In most cases voting will be a combination of adding new information and repeating back information that already available.
While not being the main point of voting, feeling good about upvoting useful post is desirable. Having a sense of community is good. Cooperation among people on LW is good. We don’t want to foster an environment where everyone feels like his on his own but a environment where people feel like they are cooperating with each other.
I disagree. If a lot of people find a post useful because it helped them, I think that’s a valuable signal for the person who wrote the post. Answering an easy question when voting leads to more voting than asking a more complicated question.
You put in the information that there was a forth independent vote, when in fact there wasn’t.
But the upvotes themselves don’t affect how useful the post is, it just affects how useful you think it is.
Is the point of karma to provide feedback to the poster or the potential audience? It seems like these are both being used to determine whether to upvote. Maybe there should be explicit;y separate karma scores for different goals.
Perhaps posts should be evaluated on more comprehensive criteria: Helpful to me Relative value compared to other posts Feedback on post quality to author
I’m not really sure where the line is between not enough things to rate and too many though. But I’ve never been comfortable with single number systems.
This paper is downloadable from http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2369332
It seems to be a summary of their much longer 2007 paper: “Why do Online Product Reviews have a J-shaped Distribution? Overcoming Biases in Online Word-of-Mouth Communication”
The link has been updated and the longer paper added. Thank you for sharing.
What we want is each voter to provide independent evidence about the quality of the post. So the voters influencing each other is a bad idea. We already have the system where for a discussion post, the karma score shrinks to a dot when it is young, so the first voters have a slightly harder time seeing how their peers are voting. This doesn’t do anything to prevent bias in my votes though, as I get to discussion threads via this page, where they are visible.
Now that LessWrong seems to have moved to most discussion occuring in open threads and other big threads like that, maybe a similar feature for comments would be a good idea?
The other thing we can do to deal with it is to just take vote scores on comments slightly less seriously, keeping in mind that they may be due to this bias.
How do they distuingish whether people just read upvoted comments more or are more likely to upvote after actually reading the comment than when it is not upvoted? I know that I personally am more likely to read higher upvoted comments for the simple reason that I think they are more likely not to waste my time. The comments I don’t read I don’t upvote or downvote.
That hypothesis predicts that the manipulated comments would also received more downvotes, but they didn’t. For an example that sounds like your hypothesis, see the claim that Eliezer’s comments get large karma scores, both positive and negative.
Isn’t this the whole point of the AntiKibitzer preference?
Depends what herd one is a part of. I think I’m part of the smaller contrarian herd that uses karma to “do my bit” in countering what I see as excesses of the herd.
Just for reference someone or a group is neg bombing me. Over the course of a few minutes I plummeted 15 karma, and it was a −1 for almost every post I made. A few minutes later it dropped 3 more as I made new comments and I suspect its going to continue to drop quickly.
I hope this isn’t considered an acceptable use of the karma system. I might think maybe just a lot of people disagreed except not a single neg came with a comment explaining it and they dropped so fast and so evenly across my comments on totally unrelated subjects that someone couldn’t have accidentally stumbled onto all my comments that quickly, ie about 2-5 minutes? It all happened in one page refresh.
Perhaps if someone is a first rater, ze should be especially thoughtful or err on the positive side in zir ratings, given that early ratings have an undue influence and an early negative rating has the potential to prevent later people from viewing the comment/post, while a later rating is unlikely to have as big of an effect. An undeservedly positive rating is likely to be responded to (if only indirectly) through later comments and ratings since it increases the likelihood of later people viewing the post, while an undeservedly negative one will not be, since it prevents (potentially) the chances of later people seeing the post.
Additionally, I wonder if people on LW are more concerned about appearing naive or uncritical by accidentally commenting favorably on a low-quality post than appearing too critical by criticizing a high-quality post. I think there might be a bias there, but it might just be that it’s easier to add value to a low-quality post by pointing out an obvious flaw than to a high-quality post.
For posts I use the vote as an indication of what the LW-consensus of this post is. So if the title is not that promising and the score is low I often don’t read it. If I do read it though, I try to account for the “bias” of the up-/ downvote and make an effort to find an independent evaluation. So I don’t really think it’s an issue.