This is the way I would do it, also taking into account EY’s point of not hiding away new comments:
Assume each comment has an ‘upvote rate’ U, such that the probability that a comment at the age of t has u upvotes is a Poisson distribution with parameter Ut,
P(u|U, t) = (Ut)^u exp(−Ut)/u!
and similarly for downvotes,
P(d|D, t) = (Dt)^d exp(−Dt)/d!
If the prior probability distribution for U and D is P(U, D), their posterior probability distribution will be
Then, you sort comments according to a functional of the posterior pdf of U and D; in analogy with expected utility maximization you could use the posterior expectation value of some function f(U, D), but other choices would be possible. (This reduces to your proposal when you take f(U, D) = U/(U + D).)
Of course this model isn’t entirely realistic because U and D ought to vary with time (according to timezone, how old the thread is and whether it’s currently linked to from the main page, etc.), but the main effect of disregarding this (pretending that a comment has the same probability of getting upvoted in the 10,000th hour after its publication as in the 1st hour) would be to cause very recent comments to be sorted higher, which IMO is a Good Thing anyway.
Why a Poisson distribution? It seems fairly clear we are looking at Bernoulli trials (people who look either upvote, or not). I doubt its a rare enough event (though it depends on the site, I suppose) that a poisson is a better approximation than a normal.
I think some factor for decreasing votes over time should be included. Exponentially decaying rates seem reasonable, and the decay time constant can be calibrated with the overall data in the domain (assuming we have data on voting times available).
That’s likely way too fast. It’s not that rare for people to comment on posts several years old (especially on Main), and I’d guess such people also vote comments. (Well, I most certainly do.) You can use an exponential decay with a very large time constant, but that would mean that comments from yesterday are voted nearly as often as comments from three months ago. So, the increase in realism compared to a constant rate isn’t large enough to justify the increase in complexity. (OTOH, hyperbolic decay is likely much more realistic, but it also has more parameters.)
This is the way I would do it, also taking into account EY’s point of not hiding away new comments:
Assume each comment has an ‘upvote rate’ U, such that the probability that a comment at the age of t has u upvotes is a Poisson distribution with parameter Ut,
P(u|U, t) = (Ut)^u exp(−Ut)/u!
and similarly for downvotes,
P(d|D, t) = (Dt)^d exp(−Dt)/d!
If the prior probability distribution for U and D is P(U, D), their posterior probability distribution will be
P(U, D|u, d, t) = D^d U^u exp(−(D + U)t) P(U, D)/(a normalization constant).
Then, you sort comments according to a functional of the posterior pdf of U and D; in analogy with expected utility maximization you could use the posterior expectation value of some function f(U, D), but other choices would be possible. (This reduces to your proposal when you take f(U, D) = U/(U + D).)
Of course this model isn’t entirely realistic because U and D ought to vary with time (according to timezone, how old the thread is and whether it’s currently linked to from the main page, etc.), but the main effect of disregarding this (pretending that a comment has the same probability of getting upvoted in the 10,000th hour after its publication as in the 1st hour) would be to cause very recent comments to be sorted higher, which IMO is a Good Thing anyway.
Why a Poisson distribution? It seems fairly clear we are looking at Bernoulli trials (people who look either upvote, or not). I doubt its a rare enough event (though it depends on the site, I suppose) that a poisson is a better approximation than a normal.
I think it’s reasonable to model this as a Poisson process. There are many people who could in theory vote, only few of them do, at random times.
You’d need to know how many people read each comment, though.
I think some factor for decreasing votes over time should be included. Exponentially decaying rates seem reasonable, and the decay time constant can be calibrated with the overall data in the domain (assuming we have data on voting times available).
That’s likely way too fast. It’s not that rare for people to comment on posts several years old (especially on Main), and I’d guess such people also vote comments. (Well, I most certainly do.) You can use an exponential decay with a very large time constant, but that would mean that comments from yesterday are voted nearly as often as comments from three months ago. So, the increase in realism compared to a constant rate isn’t large enough to justify the increase in complexity. (OTOH, hyperbolic decay is likely much more realistic, but it also has more parameters.)