I’ve asked someone trusted to try to write a program to detect mass-downvoting and even check particular individuals, but we haven’t been able to find anything! It’s possible that the database export we’re getting from the server admins is incomplete? I don’t know.
What have you/they been trying to do? Unsupervised detection of mass-downvoting, or exploration of particular specific cases alleged to have occurred? If the latter, do the records not show the downvotes at all, or do they show downvotes from many different individuals in each case?
(Does the database contain information about when any given downvote happened? I guess probably not, which makes diagnosis more difficult.)
For example: I’ve been having ~5-10 recent and older comments downvoted per day, I think all during UK night-time or early morning (i.e., roughly 4pm to 2am Pacific), most days (I think all) for about the last 4 or 5. Approximately all of the recent-ish comments I’ve checked appear to have been downvoted exactly once. (A few very recent ones haven’t been. A couple have been downvoted more than once; I guess that they were genuinely disliked on their (de)merits.)
If someone has the time to look, it would be interesting to know: Do the records show that those comments have been downvoted, and by whom? One downvoter, or a few, or many? Any signs of sockpuppetry, if many?
One possibility (though an unappealing one, not a very likely one a priori, and one that I feel a bit paranoid even mentioning) is a sort of downvoting ring of people willing to cooperate on downvoting at a rate just slow enough to avoid suspicion. That would be bad. And also sad.
[EDITED to add: I’d offer to help with the investigating, but as an interested party I probably shouldn’t.]
[EDITED again to add: at 2014-02-15 08:41 GMT, it looks as if I haven’t had a pile of downvotes in the last 8-10 hours. Maybe whoever it is has got bored, or maybe they’ve noticed evidence of someone looking. Or, of course, maybe I’m making the whole thing up, but anyone who finds that likely and cares to check can look at my comment history and see the evidence.]
My last 100 comments contain exactly one comment that is not downvoted.
I counted falenas108′s and he has at least 50 comments in a row without one comment that is not downvoted.
Starting with his first downvoted comment before the date of the above post, gwm has a string of 38 comments that are all downvoted, yet of his past 5 most recent comments (after he says the downvoter gave up) there are no downvotes.
It’s obvious that we’re all the victim of mass downvoting, and whatever Eliezer did didn’t work. The system has to at least keep track of who downvoted which post, and it shouldn’t be too hard for anyone with database access to get a count.
I suggest a simple change: for any logged-in user’s own comments, display the name(s) of the people who downvoted him. I suspect that would fix the problem.
I suggest a simple change: for any logged-in user’s own comments, display the name(s) of the people who downvoted him. I suspect that would fix the problem.
This would prevent mass downvoting, but at the cost of making votes socially significant. That’s a bigger deal than it sounds like; it means that you can schmooze people by downvoting their enemies’ comments (or by upvoting their own, if you extend it to that), and that you can incur the wrath of influential users if you dare to downvote theirs.
If you think there’s too much politics in voting patterns here already, I guarantee this will make it ten times worse. About the best thing that could happen is people converging on a Facebook-style upvote-only pattern (maybe with exceptions for obvious trolls), and I still view that as distinctly inferior to Reddit’s format for the purpose of promoting quality discussion.
How about the alternative of showing the name if greater than X percent of the user’s last Y comments have been voted down by the same person? If you downvote 98 of a user’s last 100 comments and they’re not a blatant troll, you probably deserve to earn their wrath, influential or not.
If you don’t want to do even that, then how about this instead: for each comment that was modded down, have a button to click for “moderation statistics”. If you click on the button it will say something like “This comment was modded down by 2 users. User 1 has modded down 4 of your past 100 posts. User 2 has modded down 99 of your past 100 posts”. Suspicious numbers like 99 out of 100 can be grounds for contacting an admin. There’s nothing like having an actual human being doing actual adminning. This solution would also prevent someone from gaming the system by noticing that a name is displayed at 95% so he only mods down 94% of someone’s posts.
At that point you might as well just write moderation tools for it, without requiring the user input step. Which wouldn’t be a bad idea in theory, but it runs into the usual LW bottleneck of development time.
All that’s actually needed in my case is an active admin that I can tell “this is an extremely suspicious pattern; please check it out”. Having a button to display moderation statistics is just a way to make it harder for the admin to rationalize away not doing any adminning (or looking at it from the other side, for the user to be able to prove to the admin that the problem is worth taking the time to look into).
I suggest a simple change: for any logged-in user’s own comments, display the name(s) of the people who downvoted him. I suspect that would fix the problem.
How?
Let us say you suddenly discover that a user called (say) EvilDownvoter had been downvoting all your posts. How exactly does that help stop him?
If they’re also posting comments, revealing what they are doing would discredit them as a legitimate commentator, especially if history shows that they have an argument with me that they are trying to settle by forcing me off the site.
If they’re not posting comments, that means they have a single purpose account, which is an obvious troll.
It would be possible to complain about them to an admin by name rather than complaining based on a statistical analysis of one’s posts. It would be much harder for an admin to justify inaction, and much more likely for him to lose status given inaction, than if no name could be provided.
Availability bias and related biases would make it easier to gain sympathy from others if the situation is easier to understand (no need to complain about Poisson distributions) and more specific (has a name attached).
Can you give us some details on how the votes are stored in the server? This may be difficult/impossible to do in an offline fashion if the right sort of data isn’t available.
I suspect that as with site modifications, those of us suggesting ways to find downvote stalkers would do best to figure out how LW works and do as much of the work as possible ourselves. So in this case, that’d probably mean downloading LW source code, figuring out the database structure, thinking of approaches to finding downvote stalkers, formalising them as database queries, then trying to get someone with database access to security check then run those queries. I suspect this because from what I gather Eliezer and those with database access (e.g. presumably Trike) tend to be busy enough or doing important enough other things that they are not willing to or it is not worth their time to do all this themselves, so we should do as much of it as possible to make things quicker for them.
Small amount of money to mouth: I did read through some of the webpages surrounding LW’s source code, downloaded it, and spent a little time trying to figure out how the site and database work. But by the time I got to the point of looking at the code, I had little enough temporary motivation left and the relation of the scripts to each other and the difficulty of figuring out where to start was enough that I didn’t get very far before I burned out for that night and haven’t looked again since. :z
A guide to (learning) LW’s code and database (even if just a few paragraphs along the lines of ‘Start by looking at the main article display script, then move on to...’ or commenting the scripts or something) might be higher leverage at this point with respect to improving the site than submitting small code improvements, since it might encourage several others to submit improvements. On the other hand, part of me suspects that the set of people held back just by that might actually be quite small (polarisation of would-be contributors into hardcore and indifferent with few in the middle—‘if they were going to do it, they would have done it by now’).
Given the distribution of coding ability here, it certainly seems ridiculous how slow stuff like this gets done, and I think it’s due to trivial inconveniences, ugh fields, etc., of which figuring out the site and how to submit code etc. is possibly a large part.
Since Eliezer’s response, I have slightly decreased my distribution over the level of downvote stalking, but there is still way too much evidence for me to honestly believe that there aren’t any downvote stalkers; it would take at least an explanation of exactly what had been tried and possibly significant knowledge of database structure to convince me it’s not happening at this point. So at present I defy the data.
The server needs to explicitly remember every vote from every users for the interface where anybody can change or retract any of their past votes to be possible.
Right- but if it doesn’t have a timestamp, then it’s difficult to determine whether or not one user downvoted another user many times in a few minutes, which is a more reliable sign of the karmassassination problem than just how many times one user has downvoted another user.
You could go off comment timestamp- “has user X downvoted a contiguous block of comments from user Y, or are there holes (i.e. comments user X did not downvote)?”- but that’s less useful, and more likely to catch the false positives of norm-enforcing users downvoting a repeated norm-breaker.
The past 80+ comments from me have all had at least one downvote. There is no reasonable way to interpret this other than as having a stalker.
And the solution to how not to catch false positives is to use some common sense. You’re never going to have an automated algorithm that can detect every instance of abuse, but even an instance that is not detectable by automatic means can be detectable if someone with sufficient database access takes a look when it is pointed out to them.
There is no reasonable way to interpret this other than as having a stalker.
Suppose we find the list of users who downvoted your recent comments, and there are fifteen users on that list, each of whom is an active poster in their own right. What conclusion would you draw from that?
(It may be that, when we actually find that list, there is one account, or a handful of mostly inactive accounts, that represent almost all of the downvotes, in which case ‘stalker’ is a reasonable conclusion. But it’s not the only way the data could turn out.)
And the solution to how not to catch false positives is to use some common sense.
Common sense is costly. The point to doing this algorithmically is that you get a query result that says “these are the twenty cases that might be karmassassination” instead of “these are the twenty thousand cases that might be karmassassination” or “these are the zero cases that might be karmassassination.”
It’s also not particularly wise to run this check just on people who complain- part of the point of this is to prevent karmassassins from driving users away, which hasn’t happened to the people who stuck around to complain (somewhat)- and at least a few users have a habit of downvoting any comments complaining about karma loss because they don’t like comments that complain about karma loss, and so they’ll be extra likely to show up on that list.
Suppose we find the list of users who downvoted your recent comments, and there are fifteen users on that list, each of whom is an active poster in their own right. What conclusion would you draw from that?
I’d conclude that this is an extremely weird statistical anomaly which is not one user moderating down comments, but looks almost exactly like it is. One user doing a lot of downmods has to apply the downmods to separate comments, so his downmods are spread out. 15 users producing the same total number of downmods independently of each other would produce something a lot closer to a Poisson distribution with an expected value of 1, and there should be a number of comments that have zero downmods just by chance.
And the solution to how not to catch false positives is to use some common sense. You’re never going to have an aytomated algorithm that can detect every instance of abuse, but even an instance that is not detectable by automatic means can be detectable if someone with sufficient database access takes a look when it is pointed out to them.
Right on. The solution to karma abuse isn’t some sophisticated algorithm. It’s extremely simple database queries, in plain english along the lines of “return list of downvotes by user A, and who was downvoted,” “return downvotes on posts/comments by user B, and who cast the vote,” and “return lists of downvotes by user A on user B.”
It’s extremely simple database queries, in plain english along the lines of “return list of downvotes by user A, and who was downvoted,” “return downvotes on posts/comments by user B, and who cast the vote,” and “return lists of downvotes by user A on user B.”
And then what will you do with that data? If you find that GrumpyCat666 cast most of the downvotes, does that mean that GrumpyCat666 is a karmassassin, or that GrumpyCat666 is one of the gardeners?
(I can’t find the link now, but early on there was a coded rule to prevent everyone from downvoting more than their total karma. This prevented a user whose name I don’t recall, who had downvoted about some massive fraction of all the comments the site had received, from downvoting any more comments, but this was seen as not helpful for the site, since that person was making the junk less visible.)
I’ve asked someone trusted to try to write a program to detect mass-downvoting and even check particular individuals, but we haven’t been able to find anything! It’s possible that the database export we’re getting from the server admins is incomplete? I don’t know.
Huh. Now that someone has been caught very much doing this, did you find out why you couldn’t detect it before?
Many thanks for looking into this!
What have you/they been trying to do? Unsupervised detection of mass-downvoting, or exploration of particular specific cases alleged to have occurred? If the latter, do the records not show the downvotes at all, or do they show downvotes from many different individuals in each case?
(Does the database contain information about when any given downvote happened? I guess probably not, which makes diagnosis more difficult.)
For example: I’ve been having ~5-10 recent and older comments downvoted per day, I think all during UK night-time or early morning (i.e., roughly 4pm to 2am Pacific), most days (I think all) for about the last 4 or 5. Approximately all of the recent-ish comments I’ve checked appear to have been downvoted exactly once. (A few very recent ones haven’t been. A couple have been downvoted more than once; I guess that they were genuinely disliked on their (de)merits.)
If someone has the time to look, it would be interesting to know: Do the records show that those comments have been downvoted, and by whom? One downvoter, or a few, or many? Any signs of sockpuppetry, if many?
One possibility (though an unappealing one, not a very likely one a priori, and one that I feel a bit paranoid even mentioning) is a sort of downvoting ring of people willing to cooperate on downvoting at a rate just slow enough to avoid suspicion. That would be bad. And also sad.
[EDITED to add: I’d offer to help with the investigating, but as an interested party I probably shouldn’t.]
[EDITED again to add: at 2014-02-15 08:41 GMT, it looks as if I haven’t had a pile of downvotes in the last 8-10 hours. Maybe whoever it is has got bored, or maybe they’ve noticed evidence of someone looking. Or, of course, maybe I’m making the whole thing up, but anyone who finds that likely and cares to check can look at my comment history and see the evidence.]
This is exactly the pattern for my downvoting too.
My last 100 comments contain exactly one comment that is not downvoted. I counted falenas108′s and he has at least 50 comments in a row without one comment that is not downvoted. Starting with his first downvoted comment before the date of the above post, gwm has a string of 38 comments that are all downvoted, yet of his past 5 most recent comments (after he says the downvoter gave up) there are no downvotes.
It’s obvious that we’re all the victim of mass downvoting, and whatever Eliezer did didn’t work. The system has to at least keep track of who downvoted which post, and it shouldn’t be too hard for anyone with database access to get a count.
I suggest a simple change: for any logged-in user’s own comments, display the name(s) of the people who downvoted him. I suspect that would fix the problem.
This would prevent mass downvoting, but at the cost of making votes socially significant. That’s a bigger deal than it sounds like; it means that you can schmooze people by downvoting their enemies’ comments (or by upvoting their own, if you extend it to that), and that you can incur the wrath of influential users if you dare to downvote theirs.
If you think there’s too much politics in voting patterns here already, I guarantee this will make it ten times worse. About the best thing that could happen is people converging on a Facebook-style upvote-only pattern (maybe with exceptions for obvious trolls), and I still view that as distinctly inferior to Reddit’s format for the purpose of promoting quality discussion.
How about the alternative of showing the name if greater than X percent of the user’s last Y comments have been voted down by the same person? If you downvote 98 of a user’s last 100 comments and they’re not a blatant troll, you probably deserve to earn their wrath, influential or not.
If you don’t want to do even that, then how about this instead: for each comment that was modded down, have a button to click for “moderation statistics”. If you click on the button it will say something like “This comment was modded down by 2 users. User 1 has modded down 4 of your past 100 posts. User 2 has modded down 99 of your past 100 posts”. Suspicious numbers like 99 out of 100 can be grounds for contacting an admin. There’s nothing like having an actual human being doing actual adminning. This solution would also prevent someone from gaming the system by noticing that a name is displayed at 95% so he only mods down 94% of someone’s posts.
At that point you might as well just write moderation tools for it, without requiring the user input step. Which wouldn’t be a bad idea in theory, but it runs into the usual LW bottleneck of development time.
All that’s actually needed in my case is an active admin that I can tell “this is an extremely suspicious pattern; please check it out”. Having a button to display moderation statistics is just a way to make it harder for the admin to rationalize away not doing any adminning (or looking at it from the other side, for the user to be able to prove to the admin that the problem is worth taking the time to look into).
How?
Let us say you suddenly discover that a user called (say) EvilDownvoter had been downvoting all your posts. How exactly does that help stop him?
If they’re also posting comments, revealing what they are doing would discredit them as a legitimate commentator, especially if history shows that they have an argument with me that they are trying to settle by forcing me off the site.
If they’re not posting comments, that means they have a single purpose account, which is an obvious troll.
It would be possible to complain about them to an admin by name rather than complaining based on a statistical analysis of one’s posts. It would be much harder for an admin to justify inaction, and much more likely for him to lose status given inaction, than if no name could be provided.
Availability bias and related biases would make it easier to gain sympathy from others if the situation is easier to understand (no need to complain about Poisson distributions) and more specific (has a name attached).
Can you give us some details on how the votes are stored in the server? This may be difficult/impossible to do in an offline fashion if the right sort of data isn’t available.
I suspect that as with site modifications, those of us suggesting ways to find downvote stalkers would do best to figure out how LW works and do as much of the work as possible ourselves. So in this case, that’d probably mean downloading LW source code, figuring out the database structure, thinking of approaches to finding downvote stalkers, formalising them as database queries, then trying to get someone with database access to security check then run those queries. I suspect this because from what I gather Eliezer and those with database access (e.g. presumably Trike) tend to be busy enough or doing important enough other things that they are not willing to or it is not worth their time to do all this themselves, so we should do as much of it as possible to make things quicker for them.
Small amount of money to mouth: I did read through some of the webpages surrounding LW’s source code, downloaded it, and spent a little time trying to figure out how the site and database work. But by the time I got to the point of looking at the code, I had little enough temporary motivation left and the relation of the scripts to each other and the difficulty of figuring out where to start was enough that I didn’t get very far before I burned out for that night and haven’t looked again since. :z
A guide to (learning) LW’s code and database (even if just a few paragraphs along the lines of ‘Start by looking at the main article display script, then move on to...’ or commenting the scripts or something) might be higher leverage at this point with respect to improving the site than submitting small code improvements, since it might encourage several others to submit improvements. On the other hand, part of me suspects that the set of people held back just by that might actually be quite small (polarisation of would-be contributors into hardcore and indifferent with few in the middle—‘if they were going to do it, they would have done it by now’).
Given the distribution of coding ability here, it certainly seems ridiculous how slow stuff like this gets done, and I think it’s due to trivial inconveniences, ugh fields, etc., of which figuring out the site and how to submit code etc. is possibly a large part.
Since Eliezer’s response, I have slightly decreased my distribution over the level of downvote stalking, but there is still way too much evidence for me to honestly believe that there aren’t any downvote stalkers; it would take at least an explanation of exactly what had been tried and possibly significant knowledge of database structure to convince me it’s not happening at this point. So at present I defy the data.
The server needs to explicitly remember every vote from every users for the interface where anybody can change or retract any of their past votes to be possible.
Right- but if it doesn’t have a timestamp, then it’s difficult to determine whether or not one user downvoted another user many times in a few minutes, which is a more reliable sign of the karmassassination problem than just how many times one user has downvoted another user.
You could go off comment timestamp- “has user X downvoted a contiguous block of comments from user Y, or are there holes (i.e. comments user X did not downvote)?”- but that’s less useful, and more likely to catch the false positives of norm-enforcing users downvoting a repeated norm-breaker.
The past 80+ comments from me have all had at least one downvote. There is no reasonable way to interpret this other than as having a stalker.
And the solution to how not to catch false positives is to use some common sense. You’re never going to have an automated algorithm that can detect every instance of abuse, but even an instance that is not detectable by automatic means can be detectable if someone with sufficient database access takes a look when it is pointed out to them.
Suppose we find the list of users who downvoted your recent comments, and there are fifteen users on that list, each of whom is an active poster in their own right. What conclusion would you draw from that?
(It may be that, when we actually find that list, there is one account, or a handful of mostly inactive accounts, that represent almost all of the downvotes, in which case ‘stalker’ is a reasonable conclusion. But it’s not the only way the data could turn out.)
Common sense is costly. The point to doing this algorithmically is that you get a query result that says “these are the twenty cases that might be karmassassination” instead of “these are the twenty thousand cases that might be karmassassination” or “these are the zero cases that might be karmassassination.”
It’s also not particularly wise to run this check just on people who complain- part of the point of this is to prevent karmassassins from driving users away, which hasn’t happened to the people who stuck around to complain (somewhat)- and at least a few users have a habit of downvoting any comments complaining about karma loss because they don’t like comments that complain about karma loss, and so they’ll be extra likely to show up on that list.
I’d conclude that this is an extremely weird statistical anomaly which is not one user moderating down comments, but looks almost exactly like it is. One user doing a lot of downmods has to apply the downmods to separate comments, so his downmods are spread out. 15 users producing the same total number of downmods independently of each other would produce something a lot closer to a Poisson distribution with an expected value of 1, and there should be a number of comments that have zero downmods just by chance.
Right on. The solution to karma abuse isn’t some sophisticated algorithm. It’s extremely simple database queries, in plain english along the lines of “return list of downvotes by user A, and who was downvoted,” “return downvotes on posts/comments by user B, and who cast the vote,” and “return lists of downvotes by user A on user B.”
And then what will you do with that data? If you find that GrumpyCat666 cast most of the downvotes, does that mean that GrumpyCat666 is a karmassassin, or that GrumpyCat666 is one of the gardeners?
(I can’t find the link now, but early on there was a coded rule to prevent everyone from downvoting more than their total karma. This prevented a user whose name I don’t recall, who had downvoted about some massive fraction of all the comments the site had received, from downvoting any more comments, but this was seen as not helpful for the site, since that person was making the junk less visible.)