Would a Bayesian notion of “upvotes / downvotes” work better than simple upvoting / downvoting? Suppose that instead of a simple sum of ups and downs, that there is some unknown latent “goodness” variable theta, which is the parameter of a Binomial distribution. Roughly, theta is the probability that a random reader of your post would upvote it. The sum of upvotes, or upvotes—downvotes, is not a very useful piece of information (since a highly upvoted / downvoted post could be highly controversial, but simply have a huge amount of voters). Instead of that, if you calculate the posterior distribution over theta (let’s say theta is modeled by a Beta distribution), then you have information about what theta is likely to be along with the degree of confidence in that estimate. Would calculating that every time someone votes be a huge strain on the backend?
The conjugate prior of the binomial distribution is the beta distribution, so if you use a beta distribution for theta, the posterior is also a beta distribution, and the expected value of the posterior predictive is just (u0 + u)/(u0 + u + d0 + d) where u and d are the number of up- and downvotes and u0 and d0 are the parameters of the prior distribution, or pseudocounts.
Note that when u0 and d0 are zero, or negligible because the total number of votes is large, your posterior expectation is just u/(u+d) -- in other words, exactly the %positive that LW reports when you hover over the score.
(But in practice the total number of votes is rarely large, so the prior matters.)
There are two separate issues: what to display and how to sort comments.
LessWrong displays the net number of positive votes; and, if you hover your mouse over the score, also the proportion of upvotes.
It offers several ways to sort the comments, mainly copied from Reddit, which now offers fewer ways. Go up to the top of a post. Just above the comment box, on the right, but below the tags is a triangle and the words “Sort By,” probably “Sort By: Best.” Click on the triangle and you can choose among Best, Popular, New, Controversial, Top, Old, and Leading. I think Top is net score. I’m not sure what is the difference between Popular, Best, and Leading. I suspect Leading is closest to what you suggest. Once you make a choice, all posts will be displayed that way until you choose again.
Here’s a thought: Weight votes according to how often the voter votes the same way you do.
It would neuter the effectiveness of serial downvoting, while simultaneously encouraging more participation. Your votes would benefit yourself as well as others, by training the system.
Would a Bayesian notion of “upvotes / downvotes” work better than simple upvoting / downvoting? Suppose that instead of a simple sum of ups and downs, that there is some unknown latent “goodness” variable theta, which is the parameter of a Binomial distribution. Roughly, theta is the probability that a random reader of your post would upvote it. The sum of upvotes, or upvotes—downvotes, is not a very useful piece of information (since a highly upvoted / downvoted post could be highly controversial, but simply have a huge amount of voters). Instead of that, if you calculate the posterior distribution over theta (let’s say theta is modeled by a Beta distribution), then you have information about what theta is likely to be along with the degree of confidence in that estimate. Would calculating that every time someone votes be a huge strain on the backend?
The conjugate prior of the binomial distribution is the beta distribution, so if you use a beta distribution for theta, the posterior is also a beta distribution, and the expected value of the posterior predictive is just (u0 + u)/(u0 + u + d0 + d) where u and d are the number of up- and downvotes and u0 and d0 are the parameters of the prior distribution, or pseudocounts.
You’re right, that’s in the second chapter of Gelman too. I’ll edit that.
Note that when u0 and d0 are zero, or negligible because the total number of votes is large, your posterior expectation is just u/(u+d) -- in other words, exactly the %positive that LW reports when you hover over the score.
(But in practice the total number of votes is rarely large, so the prior matters.)
ISTR reading an article about how reddit’s “best” sorting worked, and I would have described it roughly like that.
Aha, see http://www.evanmiller.org/how-not-to-sort-by-average-rating.html via https://redditblog.com/2009/10/15/reddits-new-comment-sorting-system/. I, uh, don’t actually understand it. It’s possible I read the text and made up a thing that seemed like it would do something like what the text sounded like it did.
There are two separate issues: what to display and how to sort comments.
LessWrong displays the net number of positive votes; and, if you hover your mouse over the score, also the proportion of upvotes.
It offers several ways to sort the comments, mainly copied from Reddit, which now offers fewer ways. Go up to the top of a post. Just above the comment box, on the right, but below the tags is a triangle and the words “Sort By,” probably “Sort By: Best.” Click on the triangle and you can choose among Best, Popular, New, Controversial, Top, Old, and Leading. I think Top is net score. I’m not sure what is the difference between Popular, Best, and Leading. I suspect Leading is closest to what you suggest. Once you make a choice, all posts will be displayed that way until you choose again.
Here’s a thought: Weight votes according to how often the voter votes the same way you do.
It would neuter the effectiveness of serial downvoting, while simultaneously encouraging more participation. Your votes would benefit yourself as well as others, by training the system.
This is an algorithm for producing filter bubbles, rather than for discovering or implementing community norms.