Something of interest: Jeffery’s interval. Using the lower bound of a credible interval based on that distribution (which is the same as yours) will probably give better results than just using the mean: it handles small sample sizes more gracefully. (I think, but I’m certainly willing to be corrected.)
But I fear that it would cause irreparable damage if the world settles on this solution.
This is probably vastly exaggerating the possible consequences; it’s just a method of sorting, and either the Wilson’s interval method and a Bayesian method are definitely far better than the naive methods.
Something of interest: Jeffery’s interval. Using the lower bound of a credible interval based on that distribution (which is the same as yours) will probably give better results than just using the mean: it is handles small sample sizes more gracefully. (I think, but I’m certainly willing to be corrected.)
I recently did a similar thing for ranking vendors by feedback, using both a Jeffreys interval and a Wilson interval; even on the vendors with little feedback, they were overall pretty similar. IIRC, I don’t think they differed by more than 10% anywhere.
I forgot to link in the OP. Then remembered, and forgot again.
Something of interest: Jeffery’s interval. Using the lower bound of a credible interval based on that distribution (which is the same as yours) will probably give better results than just using the mean: it handles small sample sizes more gracefully. (I think, but I’m certainly willing to be corrected.)
This seems to use specific parameters for the beta distribution. In the model I describe, the parameters are tailored per domain. This is actually an important distinction.
I think using the lower bound of an interval makes every item “guilty until proven innocent”—with no data we assume the item is of low quality. In my method we give the mean quality of all items (and it is important we calibrate the parameters for the domain). Which is better is debatable.
But I fear that it would cause irreparable damage if the world settles on this solution.
This is probably vastly exaggerating the possible consequences; it’s just a method of sorting, and either the Wilson’s interval method and a Bayesian method are definitely far better than the naive methods.
I just feel that it will place this low-hanging fruit out of reach. e.g.,
Me: Hey Reddit, I have this cool new sorting method for you to try!
Reddit: What do you mean? We’ve already moved beyond the naive methods into the correct method. Here, see Miller’s paper. No further changes are needed.
Maybe I’m exaggerating—I mean, things can be improved again after being improved once—but I just feel that if the world had a “naive rating method” itch to scratch, and something like Miller’s method became the go-to method, something is wrong.
(Link to How Not To Sort By Average Rating.)
Something of interest: Jeffery’s interval. Using the lower bound of a credible interval based on that distribution (which is the same as yours) will probably give better results than just using the mean: it handles small sample sizes more gracefully. (I think, but I’m certainly willing to be corrected.)
This is probably vastly exaggerating the possible consequences; it’s just a method of sorting, and either the Wilson’s interval method and a Bayesian method are definitely far better than the naive methods.
I recently did a similar thing for ranking vendors by feedback, using both a Jeffreys interval and a Wilson interval; even on the vendors with little feedback, they were overall pretty similar. IIRC, I don’t think they differed by more than 10% anywhere.
I forgot to link in the OP. Then remembered, and forgot again.
This seems to use specific parameters for the beta distribution. In the model I describe, the parameters are tailored per domain. This is actually an important distinction.
I think using the lower bound of an interval makes every item “guilty until proven innocent”—with no data we assume the item is of low quality. In my method we give the mean quality of all items (and it is important we calibrate the parameters for the domain). Which is better is debatable.
I just feel that it will place this low-hanging fruit out of reach. e.g.,
Maybe I’m exaggerating—I mean, things can be improved again after being improved once—but I just feel that if the world had a “naive rating method” itch to scratch, and something like Miller’s method became the go-to method, something is wrong.