Hmm… here’s the scheme I’d use. I’d partition the horoscopes into two classes based on the widths of the Wilson confidence intervals at some specific confidence level. Horoscopes with intervals wider than some threshold are classed as poorly-characterized; otherwise, well-characterized. To select a horoscope, first select a class with probability proportional to the size of the class, e.g., if 20% of horoscopes are currently well-characterized, select the well-characterized horoscopes with probability 20%. If the chosen class is the well-characterized horoscopes, use score-weighted random selection within that class; otherwise, select uniformly at random among the poorly-characterized horoscopes.
Or, more simply, use weighted scores for everything but count the horoscopes with the fewest votes twice?
Maybe consider a horoscope ‘new’ if it has less than 10% as many votes as the most-voted-on horoscope, double the scores of new horoscopes, and calculate normally from there? I have a sneaking suspicion that that will lead to a situation where some horoscopes have enough votes to not be ‘new’ anymore but too low of a score to compete with the others, and wind up stuck—but it may be self-correcting if we check to see if each horoscope is ‘new’ every day, so that when the highest-scoring one comes up to earn votes, any horoscopes that are stuck in the doldrums get a temporary boost. It may also help to apply the ‘new’ status to horoscopes that have 10% of the votes of the 30th most popular horoscope, 30 being a relevant number because it’s the number of days that a horoscope is considered ‘recently used’ if there are more than 60 horoscopes, and thus the fastest one can see a repeat.
Or, maybe have things with <10% as many votes as the relevant high-scorer get their score doubled, and things with 10-20% as many votes as the relevant high-scorer get a smaller bonus? That sounds like it will work… though, after testing, I’m not actually sure it’s necessary. This may be an artifact of the dummy data I used, though—I’ll poke at it more in a bit.
Test1Votes=10 Test1Score=0.24 (arbitrary, based on old test data given roughly-similar voting profile) Test1AdjustedScore=0.48 (about 30% the chance of High—not bad)
The reason why I suggest looking at the width of the Wilson confidence interval instead of looking directly at the number of votes is because the width of the confidence interval is a direct measure of the information we have about a horoscope. It’s hard to reason about what is likely to happen when addressing amount of votes; what we really care about is the precision with which horoscope quality is known. In particular, learning the quality of extreme horoscopes (either good or bad) takes fewer votes than learning about 50 percenters, a fact which will be reflected in the width of the confidence interval.
Hmm… here’s the scheme I’d use. I’d partition the horoscopes into two classes based on the widths of the Wilson confidence intervals at some specific confidence level. Horoscopes with intervals wider than some threshold are classed as poorly-characterized; otherwise, well-characterized. To select a horoscope, first select a class with probability proportional to the size of the class, e.g., if 20% of horoscopes are currently well-characterized, select the well-characterized horoscopes with probability 20%. If the chosen class is the well-characterized horoscopes, use score-weighted random selection within that class; otherwise, select uniformly at random among the poorly-characterized horoscopes.
Or, more simply, use weighted scores for everything but count the horoscopes with the fewest votes twice?
Maybe consider a horoscope ‘new’ if it has less than 10% as many votes as the most-voted-on horoscope, double the scores of new horoscopes, and calculate normally from there? I have a sneaking suspicion that that will lead to a situation where some horoscopes have enough votes to not be ‘new’ anymore but too low of a score to compete with the others, and wind up stuck—but it may be self-correcting if we check to see if each horoscope is ‘new’ every day, so that when the highest-scoring one comes up to earn votes, any horoscopes that are stuck in the doldrums get a temporary boost. It may also help to apply the ‘new’ status to horoscopes that have 10% of the votes of the 30th most popular horoscope, 30 being a relevant number because it’s the number of days that a horoscope is considered ‘recently used’ if there are more than 60 horoscopes, and thus the fastest one can see a repeat.
Or, maybe have things with <10% as many votes as the relevant high-scorer get their score doubled, and things with 10-20% as many votes as the relevant high-scorer get a smaller bonus? That sounds like it will work… though, after testing, I’m not actually sure it’s necessary. This may be an artifact of the dummy data I used, though—I’ll poke at it more in a bit.
Sanity check data:
HighVote=100
HighHarmful=3
HighUseless=17
HighSOUseful=35
HighUseful=40
HighAwesome=5
Lower-bound adjusted numbers, 95% confidence
AdjHarmful=0.010 −15 = −0.15
AdjUseless=0.109 −1 = −0.109
AdjSOUseful=0.264
AdjUseful=0.309 3 = 0.927
AdjAwesome=0.022 10 = 0.22
Total adj. votes=0.714
Total points=1.152
HighScore=1.613
Test1Votes=10
Test1Score=0.24 (arbitrary, based on old test data given roughly-similar voting profile)
Test1AdjustedScore=0.48 (about 30% the chance of High—not bad)
Test2Votes=15
T2Harmful=1, adjusted=0.012, weighted=-0.18
T2Useless=2, adjusted=0.037, weighted=-0.037
T2SOUseful=5, adjusted=0.152, weighted=0.152
T2Useful=6, adjusted=0.198, weighted=0.594
T2Awesome=1, adjusted=0.012, weighted=0.12
vote total:0.411
weighted total:0.649
base score: 1.579 (probably doesn’t need adjusting, actually)
The reason why I suggest looking at the width of the Wilson confidence interval instead of looking directly at the number of votes is because the width of the confidence interval is a direct measure of the information we have about a horoscope. It’s hard to reason about what is likely to happen when addressing amount of votes; what we really care about is the precision with which horoscope quality is known. In particular, learning the quality of extreme horoscopes (either good or bad) takes fewer votes than learning about 50 percenters, a fact which will be reflected in the width of the confidence interval.
That does make sense. It doesn’t help that each horoscope has 5 intervals, though. Maybe look at the narrowest one for each horoscope?
That seems reasonable.