A1987dM comments on How not to sort by a complicated frequentist formula

A1987dM 2 Jan 2013 23:23 UTC
2 points
This is the way I would do it, also taking into account EY’s point of not hiding away new comments:

Assume each comment has an ‘upvote rate’ U, such that the probability that a comment at the age of t has u upvotes is a Poisson distribution with parameter Ut,
- P(u|U, t) = (Ut)^u exp(−Ut)/u!
and similarly for downvotes,
- P(d|D, t) = (Dt)^d exp(−Dt)/d!
If the prior probability distribution for U and D is P(U, D), their posterior probability distribution will be
- P(U, D|u, d, t) = D^d U^u exp(−(D + U)t) P(U, D)/(a normalization constant).
Then, you sort comments according to a functional of the posterior pdf of U and D; in analogy with expected utility maximization you could use the posterior expectation value of some function f(U, D), but other choices would be possible. (This reduces to your proposal when you take f(U, D) = U/(U + D).)

Of course this model isn’t entirely realistic because U and D ought to vary with time (according to timezone, how old the thread is and whether it’s currently linked to from the main page, etc.), but the main effect of disregarding this (pretending that a comment has the same probability of getting upvoted in the 10,000th hour after its publication as in the 1st hour) would be to cause very recent comments to be sorted higher, which IMO is a Good Thing anyway.
- EHeller 3 Jan 2013 4:45 UTC
  2 points
  Parent
  Why a Poisson distribution? It seems fairly clear we are looking at Bernoulli trials (people who look either upvote, or not). I doubt its a rare enough event (though it depends on the site, I suppose) that a poisson is a better approximation than a normal.
  - Meni_Rosenfeld 3 Jan 2013 9:58 UTC
    4 points
    Parent
    I think it’s reasonable to model this as a Poisson process. There are many people who could in theory vote, only few of them do, at random times.
  - A1987dM 3 Jan 2013 11:42 UTC
    0 points
    Parent
    You’d need to know how many people read each comment, though.
- Meni_Rosenfeld 3 Jan 2013 10:02 UTC
  0 points
  Parent
  I think some factor for decreasing votes over time should be included. Exponentially decaying rates seem reasonable, and the decay time constant can be calibrated with the overall data in the domain (assuming we have data on voting times available).
  - A1987dM 3 Jan 2013 11:52 UTC
    1 point
    Parent
    
    Exponentially decaying rates seem reasonable
    
    That’s likely way too fast. It’s not that rare for people to comment on posts several years old (especially on Main), and I’d guess such people also vote comments. (Well, I most certainly do.) You can use an exponential decay with a very large time constant, but that would mean that comments from yesterday are voted nearly as often as comments from three months ago. So, the increase in realism compared to a constant rate isn’t large enough to justify the increase in complexity. (OTOH, hyperbolic decay is likely much more realistic, but it also has more parameters.)