Well, here I am again, this time providing a paper backing up my claim that having a downvote mechanism at all is just pure poison.
It doesn’t make any sense for this type of community. This isn’t Digg. We’re not trying to rate content so an algorithm can rank it as a news aggregation service.
Look at Slate Star Codex, where everybody is spending their time now—no aversive downvote mechanism, relaxed, cordial atmosphere, extremely minimal moderation. Proof of concept.
Just turn off the downvote button for one week and if LessWrong somehow implodes catastrophically … I’ll update.
For what it’s worth I find the SSC comment section pretty unreadable, since it is just a huge jumble of good and bad comments with no way to find the good ones.
I don’t think that’s astroturfing; I think it’s just that Scott’s one of the few semi-prominent writers outside their own sphere who’ll talk to NRx types without immediately writing them off as hateful troglodytic cranks. Which is to his credit, really.
I wouldn’t call that astroturfing, I’d say that’s more wanting anyone to talk to. The lack of a rating system means people don’t get downvoted to obvlion, instead they get banned if they break the house rules badly enough. (I’m surprised James A. Donald lasted as long as he did there.)
I don’t know what “that” you and Nornagest are referring to, so I have no way of knowing if “that” is really astroturfing or not. On the other hand, six comments about the appropriateness of a single word seems like overkill. On the gripping hand, it appears the community wants more of it, so by all means, continue.
I meant that I haven’t seen any strong evidence of astroturfing on SSC (by the conventional definition of “a deceptive campaign to create the appearance of popular support for a position, usually involving sockpuppets or other proxies”), and that the presence of an unusually large and diverse neoreactionary contingent is more easily explained by the reasons I gave.
What did you mean by it? NRx, sure, but what about them, and who’re the others you alluded to upthread? If we’re just arguing over definitions, giving them explicitly seems like the best way to drive a stake into the argument’s heart—and if you’ve noticed some bad behavior that I haven’t, I’d like to know about that too.
I appreciate your skepticism, but I doubt I can find enough evidence to convince you that NRs do this intentionally. Most of the trouble comes from not being able to find tweets from months ago unless you know exactly what you’re looking for, provided they still even exist (e.g., Konk). I’m looking into the PUAs for examples, but I don’t know their community as well.
If it’s the word you object to, perhaps “meatpuppetry” is better? I don’t really see much of a difference, as they both involve manufacturing the appearance of support through multiple accounts.
So, uh, sorry. I really thought this would be easier to show than it turned out to be.
So if I’m following this correctly, you think that the neoreactionary activity on SSC is thanks to an organized effort to create the appearance of support, but not by deceptive means? That is, Scott posts something relevant to their interests, the first neoreactionary to find it tweets “hey, come back me up”, and suddenly half the NRx sphere is posting in the comments under their standard noms de blog?
I’m still not convinced, but I’d find that more plausible than astroturfing by my understanding of the word. Not sure what I’d call it, though; “brigading” is close, but not quite it. And I’m not even sure where I’d draw the line; the distinction between “check out this cool thing” and “help me burn this witch” is awfully fine, especially when the cool thing is (e.g.) an anti-FAQ.
So one example of a pattern that I saw worked like this:
Someone writes a comment being critical of NR.
Someone else posts a tweet calling the above names and linking to their comment.
Suddenly multiple NRs come out of the rafters to reply to #1.
I’d give you actual links but I can’t trick twitter into showing me tweets from months ago anymore, and they’ve probably been deleted anyway.
The MRAs and PUAs have been known to do the same thing.
I call this astroturfing because an unrelated bystander reading the comment thread interprets the multiple responses of #3 as coming from independent sources, when in reality they’re confounded by the call to arms in #2. I suppose Wikipedia calls it “meatpuppetry”, which amounts to the same thing, IMO.
I think people go to Slate Star Codex, because that’s where Scott writes his articles, not because of the voting mechanism.
From the paper:
authors of negatively evaluated content are encouraged to post more, and their future posts are also of lower quality
Seen that at LW a few times. At some moment the user’s karma became so low they couldn’t post anymore, or perhaps an admin banned them. From my point of view, problem solved.
I think it would be useful to distinguish between systems where the downvoted comments remain visible, and where the downvoted comments are hidden.
I am reading another website, where the downvoted comments remain proudly visible, with the number of downvotes, and yes, it seem to enrage the user to write more and more of the same stuff. My hypothesis is that some people perceive downvotes as rewards (maybe they love to make people angry, or they feel they are on a crusade and the downvotes mean they successfully hurt the enemy), and these people are encouraged by downvoting. Hiding the comment, and removing the ability to comment, now that is a punishment.
My hypothesis is that some people perceive downvotes as rewards (maybe they love to make people angry, or they feel they are on a crusade and the downvotes mean they successfully hurt the enemy)
When I think others are wrong, and in particular, the groupthink is wrong, I take downvotes as a greater indication that someone needs to get their head straight, and it could be them or me. Let’s see.
I can think of at least one case where I criticized someone for something I thought was disgraceful, after his post was massively upvoted. I was massively downvoted in turn, but eventually convinced the original poster that they had crossed a line in their original post. Or at least he so indicated. Maybe he was just humoring the crazy person.
maybe they love to make people angry, or they feel they are on a crusade and the downvotes mean they successfully hurt the enemy
Downvotes are a signal. Big downvotes are a big signal.
Maybe it’s not about hurting people. Maybe it’s about identifying contradiction as the place to look for bad ideas that need fixing.
Completely serious. Just realise that different people have different goals and/or different models of the world.
Downvote is merely a signal for “some people here don’t like this”. If you care about opinions of LW readers, and you want to be liked by them, then downvotes hurt. Otherwise, they don’t.
For some sick person, making other people unhappy may be inherently desirable, and downvotes are an evidence they succeeded. Imagine some kind of psychopath that derives pleasure from frustrating strangers on internet. (Some people suggest that this actually explains a lot of internet trolling.) Or someone may model typical LW users—or, in other forum, typical users of the forum X—as their enemies whose opinions have to be opposed, and downvotes are an evidence that they succeeded to write an “inconvenient truth”. Imagine a crackpot, or a heavily mindkilled person. Or a spammer.
Tricky one. I had a look at the Facebook group and was slightly horrified. You know all the weird extrapolations-from-sequences lunacy we don’t get any more at LW? Yeah, it’s all there. I think because there are no downvotes there.
That’s true, but there are other salient differences between Facebook and LessWrong. Like the fact the Facebook has a picture of your real face right there, incentivizing everyone to play nice, while we are hobbled with only aliases here. Or the absence of a nested discussion threading system on Facebook. Or the fact the Eliezer posts on Facebook all the time now and rarely here anymore. But I tend to agree that the aversiveness of karma drives people away.
Like the fact the Facebook has a picture of your real face right there, incentivizing everyone to play nice, while we are hobbled with only aliases here.
My impression is that real-names-and-faces systems incentivize everyone to play to their expected audience’s biases, not to be nice. If the audience enjoys being nasty to someone, real-names-and-faces systems strongly disincentivize expressions of toleration.
Like the fact the Facebook has a picture of your real face right there, incentivizing everyone to play nice
This is the “real names make people nicer online” claim, which is one of those ideas people keep putting forth and for which there is no evidence it works this way. I say there is no evidence because every time it comes up I ask for some (and particularly during the G+ nymwars) and don’t get any, but if you have some I’d love to see it.
I’d rather kill karma entirely than refactor it into an upvote-only system. If you’re trying to do anything more controversial than deciding which cat picture is the best, upvote-only systems encourage nasty factional behavior that I don’t want to see here: it doesn’t matter how many people you piss off as long as you’re getting strong positive reactions, so it’s in your interests to post divisive content. That in turn leads to cliques and one-upmanship and other unpleasantness. It’s a common pattern on social media, for example.
The other failure mode you get from it is lots of content-free feel-good nonsense, but we have strong enough norms against that that I don’t think it’d be a problem in the short term.
I’d be fine with that. I feel a bit silly repeating the same arguments, but we’re supposed to be striving to be, like, the most rational humans as a community, yet the social feedback system we are using was chosen … because it came packaged with Reddit and Reddit is what was chosen as the LessWrong platform because it was the hot thing of its day. There was no clever Quirrell-esque design behind our karma system designed to bring out the best in us or protect us from the worst in us. It’s a relic. Let’s be rid of it.
By applying our methodology to four large online news communities for which we have complete article commenting and comment voting data (about 140 million votes on 42 million comments), we discover that community feedback does not appear to drive the behavior of users in a direction that is beneficial to the community, as predicted by the operant conditioning framework. Instead, we find that community feedback is likely to perpetuate undesired behavior. In particular, punished authors actually write worse in subsequent posts, while rewarded authors do not improve significantly.
In a footnote, they discuss what they meant by “write worse”:
One important subtlety here is that the observed quality of a post (i.e., the proportion of up-votes) is not entirely a direct consequence of the actual textual quality of the post, but is also affected by community bias effects. We account for this through experiments specifically designed to disentangle these two factors.
They measure post quality based on textual evidence by spinning up a mechanical turk on 171 comments and using that data to train a binomial regression model. So cool!
When comparing the fraction of upvotes received by a user with the fraction of upvotes given by a user, we find a strong linear correlation. This suggests that user behavior is largely “tit-for-tat”.… However, we also note an interesting deviation from the general trend. In particular, very negatively evaluated people actually respond in a positive direction: the proportion of up-votes they give is higher than the proportion of up-votes they receive. On the other hand, users receiving many up-votes appear to be more “critical”, as they evaluate others more negatively.
Incredibly interesting article. Must read.
EDIT: Consider myself updated. Therefore, I believe downvotes must be destroyed.
The main function of downvotes in LW is NOT to re-educate the offender. Its main function is to make the content which has been sufficiently downvoted effectively invisible.
If you eliminate the downvotes, what will replace them to prune the bad content?
Well, if this is really the goal, then maybe disentangle downvotes from both post/comment karma and personal karma while leaving the invisibility rules in place? Make it more of a “mark as non-constructive” button that if enough people hit it, the post becomes invisible. If we want to make it more comprehensive, it could be made to weigh these votes against upvotes to make the show/hide decision.
The main function of downvotes in LW is NOT to re-educate the offender. Its main function is to make the content which has been sufficiently downvoted effectively invisible.
I am aware of the concept. What exactly do you mean?
The above study is sufficient evidence for me
It says “This paper investigates how ratings on a piece of content affect its author’s future behavior.” I don’t think LW should be in the business of re-educating its users to become good ’net citizens. I’m more interested in effective filtering of trolling, stupidity, aggression, drama, dick waving, drive-by character assassination, etc. etc.
It’s not like the observation that downvoting a troll does not magically convert him into a hobbit is news.
It is seriously broken in many ways, I was mainly highlighting the tone and the fact that it doesn’t have a voting mechanism and the fact that people still use it in droves despite its huge flaws.
i think that has way more to do with it being a blog with interesting posts on than anything to do with the commenting system or lack of “like” buttons.
Digging into the paper, I give them an A for effort—they used some interesting methodologies—but there’s a serious problem with it that destroys many of its conclusions. Here’s 3 different measures they used of a post’s quality:
q’: Quality as determined by blinded users given instructions on how to vote.
p: upvotes / (upvotes + downvotes)
q: Prediction for p, based on bigram frequencies of the post, trained on known p for half the dataset
q is the measure they used for most of their conclusions. Note that it is supposed to represent quality, but is based entirely on bigrams. This doesn’t pass the sniff test. Whatever q measures, it isn’t quality. At best it’s grammaticality. It is more likely a prediction of rating based on the user’s identity (individuals have identifiable bigram counts) or politics (“liberal media” and “death tax” vs. “pro choice” and “hate crime”).
q is a prediction for p. p is a proxy for q’. There is no direct connection between q’ and q—no reason to think they will have any correlation not mediated by p.
R-squared values:
q to p: 0.04 (unless it is a typo when it says “mean R = 0.22” and should actually say “mean R^2 = 0.22″)
q to q’: 0.25
q’ to p: 0.12
First, the R-squared between q’, quality scores by judges, and p, community rating, is 0.12. That’s crap. It means that votes are almost unrelated to post quality.
Next, the strongest correlation is between q and q’, but the maximum possible causal correlation between them is 0.04 * 0.12 = 0.0048, because there is no causal connection between them except p.
That means that q, the machine-learned prediction they use for their study, has an acausal correlation with q’, post quality, that is 50 times stronger than the causal correlation.
In other words, all their numbers are bullshit. They aren’t produced by post quality, nor by user voting patterns. There is something wrong with how they’ve processed their data that has produced an artifactual correlation.
this paper seems to say exactly the opposite of complaints I’ve heard from people about how posting on lesswrong is scary because they don’t want to get downvoted.
Well, here I am again, this time providing a paper backing up my claim that having a downvote mechanism at all is just pure poison.
It doesn’t make any sense for this type of community. This isn’t Digg. We’re not trying to rate content so an algorithm can rank it as a news aggregation service.
Look at Slate Star Codex, where everybody is spending their time now—no aversive downvote mechanism, relaxed, cordial atmosphere, extremely minimal moderation. Proof of concept.
Just turn off the downvote button for one week and if LessWrong somehow implodes catastrophically … I’ll update.
For what it’s worth I find the SSC comment section pretty unreadable, since it is just a huge jumble of good and bad comments with no way to find the good ones.
There’s also a significant amount of astroturfing from various sources that muddies the water further.
?? Such as?
Presumably p-m primarily means the neoreactionaries.
I don’t think that’s astroturfing; I think it’s just that Scott’s one of the few semi-prominent writers outside their own sphere who’ll talk to NRx types without immediately writing them off as hateful troglodytic cranks. Which is to his credit, really.
That’s fair, but I think it was probably what paper-machine was referring to.
More or less. They’re not the only ones, of course, but perhaps they’re the most obvious.
I wouldn’t call that astroturfing, I’d say that’s more wanting anyone to talk to. The lack of a rating system means people don’t get downvoted to obvlion, instead they get banned if they break the house rules badly enough. (I’m surprised James A. Donald lasted as long as he did there.)
I don’t know what “that” you and Nornagest are referring to, so I have no way of knowing if “that” is really astroturfing or not. On the other hand, six comments about the appropriateness of a single word seems like overkill. On the gripping hand, it appears the community wants more of it, so by all means, continue.
I mean the neoreactionaries on SSC.
I meant that I haven’t seen any strong evidence of astroturfing on SSC (by the conventional definition of “a deceptive campaign to create the appearance of popular support for a position, usually involving sockpuppets or other proxies”), and that the presence of an unusually large and diverse neoreactionary contingent is more easily explained by the reasons I gave.
What did you mean by it? NRx, sure, but what about them, and who’re the others you alluded to upthread? If we’re just arguing over definitions, giving them explicitly seems like the best way to drive a stake into the argument’s heart—and if you’ve noticed some bad behavior that I haven’t, I’d like to know about that too.
I appreciate your skepticism, but I doubt I can find enough evidence to convince you that NRs do this intentionally. Most of the trouble comes from not being able to find tweets from months ago unless you know exactly what you’re looking for, provided they still even exist (e.g., Konk). I’m looking into the PUAs for examples, but I don’t know their community as well.
If it’s the word you object to, perhaps “meatpuppetry” is better? I don’t really see much of a difference, as they both involve manufacturing the appearance of support through multiple accounts.
So, uh, sorry. I really thought this would be easier to show than it turned out to be.
So if I’m following this correctly, you think that the neoreactionary activity on SSC is thanks to an organized effort to create the appearance of support, but not by deceptive means? That is, Scott posts something relevant to their interests, the first neoreactionary to find it tweets “hey, come back me up”, and suddenly half the NRx sphere is posting in the comments under their standard noms de blog?
I’m still not convinced, but I’d find that more plausible than astroturfing by my understanding of the word. Not sure what I’d call it, though; “brigading” is close, but not quite it. And I’m not even sure where I’d draw the line; the distinction between “check out this cool thing” and “help me burn this witch” is awfully fine, especially when the cool thing is (e.g.) an anti-FAQ.
“Dogpiling” is the word I’ve seen.
Swarming?
As an aside, I have doubts that the neoreactionaries are *that* interested in gaming Yvain’s blog...
They’re massively interested in controlling their presence on the Internet.
So one example of a pattern that I saw worked like this:
Someone writes a comment being critical of NR.
Someone else posts a tweet calling the above names and linking to their comment.
Suddenly multiple NRs come out of the rafters to reply to #1.
I’d give you actual links but I can’t trick twitter into showing me tweets from months ago anymore, and they’ve probably been deleted anyway.
The MRAs and PUAs have been known to do the same thing.
I call this astroturfing because an unrelated bystander reading the comment thread interprets the multiple responses of #3 as coming from independent sources, when in reality they’re confounded by the call to arms in #2. I suppose Wikipedia calls it “meatpuppetry”, which amounts to the same thing, IMO.
I think people go to Slate Star Codex, because that’s where Scott writes his articles, not because of the voting mechanism.
From the paper:
Seen that at LW a few times. At some moment the user’s karma became so low they couldn’t post anymore, or perhaps an admin banned them. From my point of view, problem solved.
I think it would be useful to distinguish between systems where the downvoted comments remain visible, and where the downvoted comments are hidden.
I am reading another website, where the downvoted comments remain proudly visible, with the number of downvotes, and yes, it seem to enrage the user to write more and more of the same stuff. My hypothesis is that some people perceive downvotes as rewards (maybe they love to make people angry, or they feel they are on a crusade and the downvotes mean they successfully hurt the enemy), and these people are encouraged by downvoting. Hiding the comment, and removing the ability to comment, now that is a punishment.
A bog-standard troll wants attention and drama. Downvotes are evidence of attention and drama.
When I think others are wrong, and in particular, the groupthink is wrong, I take downvotes as a greater indication that someone needs to get their head straight, and it could be them or me. Let’s see.
I can think of at least one case where I criticized someone for something I thought was disgraceful, after his post was massively upvoted. I was massively downvoted in turn, but eventually convinced the original poster that they had crossed a line in their original post. Or at least he so indicated. Maybe he was just humoring the crazy person.
Downvotes are a signal. Big downvotes are a big signal.
Maybe it’s not about hurting people. Maybe it’s about identifying contradiction as the place to look for bad ideas that need fixing.
“some people perceive downvotes as rewards”
Is this just a dig at people vehemently defending downvoted posts or are you serious in calling this a hypothesis?
Completely serious. Just realise that different people have different goals and/or different models of the world.
Downvote is merely a signal for “some people here don’t like this”. If you care about opinions of LW readers, and you want to be liked by them, then downvotes hurt. Otherwise, they don’t.
For some sick person, making other people unhappy may be inherently desirable, and downvotes are an evidence they succeeded. Imagine some kind of psychopath that derives pleasure from frustrating strangers on internet. (Some people suggest that this actually explains a lot of internet trolling.) Or someone may model typical LW users—or, in other forum, typical users of the forum X—as their enemies whose opinions have to be opposed, and downvotes are an evidence that they succeeded to write an “inconvenient truth”. Imagine a crackpot, or a heavily mindkilled person. Or a spammer.
To trolls any attention (including downvotes) is a reward.
Tricky one. I had a look at the Facebook group and was slightly horrified. You know all the weird extrapolations-from-sequences lunacy we don’t get any more at LW? Yeah, it’s all there. I think because there are no downvotes there.
That’s true, but there are other salient differences between Facebook and LessWrong. Like the fact the Facebook has a picture of your real face right there, incentivizing everyone to play nice, while we are hobbled with only aliases here. Or the absence of a nested discussion threading system on Facebook. Or the fact the Eliezer posts on Facebook all the time now and rarely here anymore. But I tend to agree that the aversiveness of karma drives people away.
My impression is that real-names-and-faces systems incentivize everyone to play to their expected audience’s biases, not to be nice. If the audience enjoys being nasty to someone, real-names-and-faces systems strongly disincentivize expressions of toleration.
The very nastiest trolls I’ve encountered really just do not give a shit. Name, address, phone number, all publicly available.
This is the “real names make people nicer online” claim, which is one of those ideas people keep putting forth and for which there is no evidence it works this way. I say there is no evidence because every time it comes up I ask for some (and particularly during the G+ nymwars) and don’t get any, but if you have some I’d love to see it.
edit: and by the way, here’s my “photo”.
Using a photograph of yourself on Facebook is optional.
I’d rather kill karma entirely than refactor it into an upvote-only system. If you’re trying to do anything more controversial than deciding which cat picture is the best, upvote-only systems encourage nasty factional behavior that I don’t want to see here: it doesn’t matter how many people you piss off as long as you’re getting strong positive reactions, so it’s in your interests to post divisive content. That in turn leads to cliques and one-upmanship and other unpleasantness. It’s a common pattern on social media, for example.
The other failure mode you get from it is lots of content-free feel-good nonsense, but we have strong enough norms against that that I don’t think it’d be a problem in the short term.
I’d be fine with that. I feel a bit silly repeating the same arguments, but we’re supposed to be striving to be, like, the most rational humans as a community, yet the social feedback system we are using was chosen … because it came packaged with Reddit and Reddit is what was chosen as the LessWrong platform because it was the hot thing of its day. There was no clever Quirrell-esque design behind our karma system designed to bring out the best in us or protect us from the worst in us. It’s a relic. Let’s be rid of it.
No Karma 2014
Specifically:
In a footnote, they discuss what they meant by “write worse”:
They measure post quality based on textual evidence by spinning up a mechanical turk on 171 comments and using that data to train a binomial regression model. So cool!
Incredibly interesting article. Must read.
EDIT: Consider myself updated. Therefore, I believe downvotes must be destroyed.
The main function of downvotes in LW is NOT to re-educate the offender. Its main function is to make the content which has been sufficiently downvoted effectively invisible.
If you eliminate the downvotes, what will replace them to prune the bad content?
Well, if this is really the goal, then maybe disentangle downvotes from both post/comment karma and personal karma while leaving the invisibility rules in place? Make it more of a “mark as non-constructive” button that if enough people hit it, the post becomes invisible. If we want to make it more comprehensive, it could be made to weigh these votes against upvotes to make the show/hide decision.
Could be done, though it makes karma even more irrelevant to anything.
Negative externalities.
Something else? The above study is sufficient evidence for me (and hopefully others) to start finding another solution.
I am aware of the concept. What exactly do you mean?
It says “This paper investigates how ratings on a piece of content affect its author’s future behavior.” I don’t think LW should be in the business of re-educating its users to become good ’net citizens. I’m more interested in effective filtering of trolling, stupidity, aggression, drama, dick waving, drive-by character assassination, etc. etc.
It’s not like the observation that downvoting a troll does not magically convert him into a hobbit is news.
I do not like the voting and commenting system at Slate Star Codex.
It is seriously broken in many ways, I was mainly highlighting the tone and the fact that it doesn’t have a voting mechanism and the fact that people still use it in droves despite its huge flaws.
i think that has way more to do with it being a blog with interesting posts on than anything to do with the commenting system or lack of “like” buttons.
Digging into the paper, I give them an A for effort—they used some interesting methodologies—but there’s a serious problem with it that destroys many of its conclusions. Here’s 3 different measures they used of a post’s quality:
q’: Quality as determined by blinded users given instructions on how to vote.
p: upvotes / (upvotes + downvotes)
q: Prediction for p, based on bigram frequencies of the post, trained on known p for half the dataset
q is the measure they used for most of their conclusions. Note that it is supposed to represent quality, but is based entirely on bigrams. This doesn’t pass the sniff test. Whatever q measures, it isn’t quality. At best it’s grammaticality. It is more likely a prediction of rating based on the user’s identity (individuals have identifiable bigram counts) or politics (“liberal media” and “death tax” vs. “pro choice” and “hate crime”).
q is a prediction for p. p is a proxy for q’. There is no direct connection between q’ and q—no reason to think they will have any correlation not mediated by p.
R-squared values:
q to p: 0.04 (unless it is a typo when it says “mean R = 0.22” and should actually say “mean R^2 = 0.22″)
q to q’: 0.25
q’ to p: 0.12
First, the R-squared between q’, quality scores by judges, and p, community rating, is 0.12. That’s crap. It means that votes are almost unrelated to post quality.
Next, the strongest correlation is between q and q’, but the maximum possible causal correlation between them is 0.04 * 0.12 = 0.0048, because there is no causal connection between them except p.
That means that q, the machine-learned prediction they use for their study, has an acausal correlation with q’, post quality, that is 50 times stronger than the causal correlation.
In other words, all their numbers are bullshit. They aren’t produced by post quality, nor by user voting patterns. There is something wrong with how they’ve processed their data that has produced an artifactual correlation.
It would be interesting to run the voting data for LW through the analyses they made.
this paper seems to say exactly the opposite of complaints I’ve heard from people about how posting on lesswrong is scary because they don’t want to get downvoted.