New “Best” comment sorting system

matt2 Jul 2012 11:08 UTC

35 points

Way back in October 2009 Reddit introduced their “Best” comment sorting system. We’ve just pulled those changes into Less Wrong. The changes affect only comments, not stories.

It’s good. It should significantly improve the visibility of good comments posted later in the life of an article. You (yes you) should adopt it. It’s the default for new users.

See http://blog.reddit.com/2009/10/reddits-new-comment-sorting-system.html for the details.

Location of "Best" comment sorting option

What links here?

matt2 Jul 2012 11:08 UTC

35 points

21 comments1 min readLW link Archive

Site Meta

Kaj_Sotala 2 Jul 2012 18:40 UTC
18 points
Short version of how this is different, for those too lazy to click on the link: if you sort by “top”, comments get sorted in a simple “the ones with the highest score go on top” order. This has the problem that it favors comments that were posted early on, since they’re the ones that people see first and they’ve had a lot of time to gather upvotes. A good comment that’s posted late might get stuck near the bottom because few people ever scroll all the way down to upvote it.

“Best” uses some statistical magic to fix that:

If everyone got a chance to see a comment and vote on it, it would get some proportion of upvotes to downvotes. This algorithm treats the vote count as a statistical sampling of a hypothetical full vote by everyone, much as in an opinion poll. It uses this to calculate the 95% confidence score for the comment. That is, it gives the comment a provisional ranking that it is 95% sure it will get to. The more votes, the closer the 95% confidence score gets to the actual score.

If a comment has one upvote and zero downvotes, it has a 100% upvote rate, but since there’s not very much data, the system will keep it near the bottom. But if it has 10 upvotes and only 1 downvote, the system might have enough confidence to place it above something with 40 upvotes and 20 downvotes—figuring that by the time it’s also gotten 40 upvotes, it’s almost certain it will have fewer than 20 downvotes. And the best part is that if it’s wrong (which it is 5% of the time), it will quickly get more data, since the comment with less data is near the top—and when it gets that data, it will quickly correct the comment’s position. The bottom line is that this system means good comments will jump quickly to the top and stay there, and bad comments will hover near the bottom. (Picky readers might observe that some comments probably get a higher rate of votes, up or down, than others, which this system doesn’t explicitly model. However, any bias which that introduces is tiny in comparison to the time bias which the system removes, and comments which get fewer overall votes will stay a bit lower anyway due to lower confidence.)

Not sure I fully understood that either. But they say it works well, so I guess I’ll trust them!
What links here?
- Kaj_Sotala's comment on New “Best” comment sorting system by matt (2 Jul 2012 18:46 UTC; 2 points)
- pjeby 2 Jul 2012 20:03 UTC
  10 points
  Parent
  
  “Best” uses some statistical magic to fix that:
  
  I’m curious whether the math still works correctly on a site where the default karma is 1 instead of 0. But since it’s magic to start with, I guess “meh”. Let’s just not use it to calculate CEV or anything. ;-)
- jsalvatier 3 Jul 2012 2:28 UTC
  4 points
  Parent
  I think what they’re doing is doing statistical inference for the fraction upvotes/total_votes. I’m not sure this is the best model, possible but it seems to have worked well enough.
  
  I suspect they’re taking the mean of the 95% confidence interval, but I’m not sure. There’s actually a pretty natural way to do this more rigorously in a Bayesian framework, called hierarchical modeling (similar to this), but it can be complex to fit such a model.
  
  Edit: However, a simpler Bayesian approach would just be to do inference for a proportion using a ‘reasonable’ prior for the proportion (which approximates the actual distribution of proportions) expressed as a Beta distribution (this makes the math easy). Come to think of it, this would actually be pretty easy to implement. You could even fit a full hierarchical model using a data set and then use the prior for the proportion you get from that in your algorithm. The advantage to this is that you can do the full hierarchical model offline in R and avoid having to do expensive tasks repeatedly and having to code up the fitting code. The rest of the math is very simple. This idea is simple enough that I bet someone else has done it.
  - omslin 3 Jul 2012 6:45 UTC
    8 points
    Parent
    If you use the Bayes approach with a Beta(x,y) prior, all you do is for each post add x to the # of upvotes, add y to the # of downvotes, and then compute the % of votes which are upvotes. [1]
    
    In my college AI class we used this exact method with x=y=1 to adjust for low sample size. Someone should switch out the clunky frequentist method reddit apparently uses with this Bayesian method!
    
    [1] This seems to be what it says in the pdf.
Alicorn 2 Jul 2012 16:46 UTC
7 points
How do the “Best”, “Popular”, and “Top” algorithms work?
- albeola 2 Jul 2012 18:22 UTC
  7 points
  Parent
  Ironically, it appears the new algorithm is frequentist.
  - matt 2 Jul 2012 23:00 UTC
    5 points
    Parent
    Bayesian reformulations welcome.
    - albeola 4 Jul 2012 8:07 UTC
      1 point
      Parent
      Apologies — I should have taken reinforcement into account and noted that the new algorithm is probably still a lot better than the previous one.
    - jsalvatier 3 Jul 2012 3:02 UTC
      0 points
      Parent
      This seems like a neat problem. Would it be hard to go from a python function that takes a set of comment upvote downvote counts and returns a ranking to a comment sorting option? If I don’t know much about the reddit internals?
      
      Also, would it be difficult to get a real dataset of comment counts from LW?
- Kaj_Sotala 2 Jul 2012 18:46 UTC
  2 points
  Parent
  “Top” simply calculates the (number of upvotes—number of downvotes) and puts on top the comments that rank the highest this way.
  
  I think “Popular” tries to favor comments that don’t have many downvotes or something, I’m not sure.
  
  “Best” apparently works by magic.
  - matt 2 Jul 2012 22:56 UTC
    2 points
    Parent
    I think “Popular” adds weight to recent comments. This seems to be a much worse way of achieving what “Best” shoots for.
    - Elund 3 Nov 2014 2:00 UTC
      0 points
      Parent
      
      This seems to be a much worse way of achieving what “Best” shoots for.
      
      Not necessarily. Someone who has already seen the best comments and returns a while later to see what new but good comments have been posted may have a use for it.
dbaupp 2 Jul 2012 11:59 UTC
4 points
Yay! Thanks Matt and the tricycle team (and anyone else) for continuing to improve LW.
- matt 2 Jul 2012 22:59 UTC
  2 points
  Parent
  Work done by John Simon, and integrated by Wes.
  - Kaj_Sotala 3 Jul 2012 7:25 UTC
    0 points
    Parent
    Thanks, John and Wes!
Jayson_Virissimo 2 Jul 2012 12:50 UTC
2 points
Thanks Matt!
Axel 2 Jul 2012 12:03 UTC
1 point
Thank you for taking the time to implement this, I’ve set it as my “sort by” criteria.
vollmer 22 Nov 2014 19:33 UTC
0 points
For me, this is not working for some of the posts,

e.g. http://lesswrong.com/lw/kd/pascals_mugging_tiny_probabilities_of_vast/?sort=top
Elund 3 Nov 2014 2:42 UTC
0 points
I found a Reddit thread explaining the different comment sorting systems. Does LW use the same algorithms for each method?

http://www.reddit.com/r/TheoryOfReddit/comments/1y8rst/what_is_the_best_way_to_sort_top_best_new/

Missing from their list though are “popular” and “leading” (and “old”, but that’s pretty self-explanatory). I’m guessing “popular” is the same thing as “hot”, judging based on what appears in my address bar when I sort that way. “Leading” is listed as “interestingness” in the address bar, which leads me to think it adds weight to comments that inspire a lot of discussion. My observations suggest that it also factors in votes though. Could someone please clarify further on what these algorithms do?
saturn 4 Jul 2012 0:04 UTC
0 points
I’ve noticed that the “Best” sorting sometimes puts strongly downvoted comments (score −5 or less) above comments with scores closer to zero. Is this intentional or a bug?
Sabiola 2 Jul 2012 13:57 UTC
0 points
What is the difference between Best, Popular, and Top?