I think what they’re doing is doing statistical inference for the fraction upvotes/total_votes. I’m not sure this is the best model, possible but it seems to have worked well enough.
I suspect they’re taking the mean of the 95% confidence interval, but I’m not sure. There’s actually a pretty natural way to do this more rigorously in a Bayesian framework, called hierarchical modeling (similar to this), but it can be complex to fit such a model.
Edit: However, a simpler Bayesian approach would just be to do inference for a proportion using a ‘reasonable’ prior for the proportion (which approximates the actual distribution of proportions) expressed as a Beta distribution (this makes the math easy). Come to think of it, this would actually be pretty easy to implement. You could even fit a full hierarchical model using a data set and then use the prior for the proportion you get from that in your algorithm. The advantage to this is that you can do the full hierarchical model offline in R and avoid having to do expensive tasks repeatedly and having to code up the fitting code. The rest of the math is very simple. This idea is simple enough that I bet someone else has done it.
If you use the Bayes approach with a Beta(x,y) prior, all you do is for each post add x to the # of upvotes, add y to the # of downvotes, and then compute the % of votes which are upvotes. [1]
In my college AI class we used this exact method with x=y=1 to adjust for low sample size. Someone should switch out the clunky frequentist method reddit apparently uses with this Bayesian method!
I think what they’re doing is doing statistical inference for the fraction upvotes/total_votes. I’m not sure this is the best model, possible but it seems to have worked well enough.
I suspect they’re taking the mean of the 95% confidence interval, but I’m not sure. There’s actually a pretty natural way to do this more rigorously in a Bayesian framework, called hierarchical modeling (similar to this), but it can be complex to fit such a model.
Edit: However, a simpler Bayesian approach would just be to do inference for a proportion using a ‘reasonable’ prior for the proportion (which approximates the actual distribution of proportions) expressed as a Beta distribution (this makes the math easy). Come to think of it, this would actually be pretty easy to implement. You could even fit a full hierarchical model using a data set and then use the prior for the proportion you get from that in your algorithm. The advantage to this is that you can do the full hierarchical model offline in R and avoid having to do expensive tasks repeatedly and having to code up the fitting code. The rest of the math is very simple. This idea is simple enough that I bet someone else has done it.
If you use the Bayes approach with a Beta(x,y) prior, all you do is for each post add x to the # of upvotes, add y to the # of downvotes, and then compute the % of votes which are upvotes. [1]
In my college AI class we used this exact method with x=y=1 to adjust for low sample size. Someone should switch out the clunky frequentist method reddit apparently uses with this Bayesian method!
[1] This seems to be what it says in the pdf.