Very well thought out. I think the two biggest things missing from your analysis are:
5-star ratings have become corrupted in the wild. Small-time authors get legitimately angered when a fan rates their work as 4-stars on Amazon, because anything other than 5 stars is very damaging. We don’t want to port this behavior/intuition to LW, but by default that’s what we’d do. jimrandomh mentions this in their comment. I don’t know how to overcome this problem while retaining 5-star ratings.
Users like to “correct” a post/comment’s rating. Personally I hate this behavior, but after ranting about it several times over the years I’ve learned that I don’t represent everyone. :D So if they see a comment with average 2-stars, which they think should be 3.5, they will not want to rate it 3.5. Instead they will want to rate it 5.0 to “make up for” the other “wrongheaded” views. Maybe one way to overcome this problem is to allow a lightweight way to say “I think this is 3.5. I want to use my QJR to fix the current rating so I’m voting 5.0. Please increase the chance a mod sees this, double down on my bet, and maybe even change my rating back to 3.5 if the post becomes 3.5?? And also if a mod rates a post maybe this should drastically reduce the effective QJR of people contradicting the mod… or something, if a mod says 3.5 and the weighted avg is still only 2.5 I won’t be happy”.
I suggested the 5-star interface because it’s the most common way of giving things scores on a fixed scale. We could easily use a slider, or a number between 0 and 100 from my perspective. I think we want to err towards intuitive/easy interfaces even if it means porting over some bad intuitions from Amazon or whatever, but I’m not confident on this point.
I toyed with the idea of having a strong-bet option, which lets a user put down a stronger QJR bet than normal, and thus influence the community rating more than they would by default (albeit exposing them to higher risk). I mainly avoided it in the above post because it seemed like unnecessary complexity, although I appreciate the point about people overcompensating in order to have more influence.
One idea that I just had is that instead of having the community rating set by the weighted mean, perhaps it should be the weighted median. The effect of this would be such that voting 5-stars on a 2-star post would have exactly the same amount of sway as voting 3.5, right up until the 3.5 line is crossed. I really like this idea, and will edit the post body to mention it. Thanks!
I am a huge fan of tiered-complexity views on complex underlying systems. The description to new users would be:
Ratings are a magic median-like combination of how users rated a post. Click through for more details...
Displayed ratings are the median of how users have rated the post/comment. Smoothed. Weighted by how LessWrongy the rater has been. Your own rating will have more effect when your historical ratings are good predictions of how trusted moderators end up rating. Click through for more details...
Sometimes mods will rate posts/comments, after careful reflection of how they want LessWrong in general to rate. When they do, everyone who previously rated will be awarded additional weight to their future votes if their ratings were similar to what the mod decided, or penalized with less future vote weight if their ratings were pretty far off. That’s how the weights are determined when aggregating people’s votes on comments. Of course, it’s more complicated than that. Folks were grandfathered in. New folks [behavior]. Mods who are regularly different than other mods and high-weight voters trigger investigation into whether they should be mods anymore, or whether everyone is getting something wrong. Multiple mod votes are a thing, as is voting similar to high-weight voters (?? maybe ?? is it ??), as is promoting high-weight voters to mods, as is etc etc. Click through for more details, including math...
Very well thought out. I think the two biggest things missing from your analysis are:
5-star ratings have become corrupted in the wild. Small-time authors get legitimately angered when a fan rates their work as 4-stars on Amazon, because anything other than 5 stars is very damaging. We don’t want to port this behavior/intuition to LW, but by default that’s what we’d do. jimrandomh mentions this in their comment. I don’t know how to overcome this problem while retaining 5-star ratings.
Users like to “correct” a post/comment’s rating. Personally I hate this behavior, but after ranting about it several times over the years I’ve learned that I don’t represent everyone. :D So if they see a comment with average 2-stars, which they think should be 3.5, they will not want to rate it 3.5. Instead they will want to rate it 5.0 to “make up for” the other “wrongheaded” views. Maybe one way to overcome this problem is to allow a lightweight way to say “I think this is 3.5. I want to use my QJR to fix the current rating so I’m voting 5.0. Please increase the chance a mod sees this, double down on my bet, and maybe even change my rating back to 3.5 if the post becomes 3.5?? And also if a mod rates a post maybe this should drastically reduce the effective QJR of people contradicting the mod… or something, if a mod says 3.5 and the weighted avg is still only 2.5 I won’t be happy”.
I suggested the 5-star interface because it’s the most common way of giving things scores on a fixed scale. We could easily use a slider, or a number between 0 and 100 from my perspective. I think we want to err towards intuitive/easy interfaces even if it means porting over some bad intuitions from Amazon or whatever, but I’m not confident on this point.
I toyed with the idea of having a strong-bet option, which lets a user put down a stronger QJR bet than normal, and thus influence the community rating more than they would by default (albeit exposing them to higher risk). I mainly avoided it in the above post because it seemed like unnecessary complexity, although I appreciate the point about people overcompensating in order to have more influence.
One idea that I just had is that instead of having the community rating set by the weighted mean, perhaps it should be the weighted median. The effect of this would be such that voting 5-stars on a 2-star post would have exactly the same amount of sway as voting 3.5, right up until the 3.5 line is crossed. I really like this idea, and will edit the post body to mention it. Thanks!
Another issue I’d highlight is one of complexity. When I consider how much math is involved:
This post involves Gaussians, logarithms, weighted means, integration, and probably a few other things I missed.
The current karma system uses...addition? Sometimes subtraction?
One of these things is much more transparent to new users.
I am a huge fan of tiered-complexity views on complex underlying systems. The description to new users would be:
Ratings are a magic median-like combination of how users rated a post. Click through for more details...
Displayed ratings are the median of how users have rated the post/comment. Smoothed. Weighted by how LessWrongy the rater has been. Your own rating will have more effect when your historical ratings are good predictions of how trusted moderators end up rating. Click through for more details...
Sometimes mods will rate posts/comments, after careful reflection of how they want LessWrong in general to rate. When they do, everyone who previously rated will be awarded additional weight to their future votes if their ratings were similar to what the mod decided, or penalized with less future vote weight if their ratings were pretty far off. That’s how the weights are determined when aggregating people’s votes on comments. Of course, it’s more complicated than that. Folks were grandfathered in. New folks [behavior]. Mods who are regularly different than other mods and high-weight voters trigger investigation into whether they should be mods anymore, or whether everyone is getting something wrong. Multiple mod votes are a thing, as is voting similar to high-weight voters (?? maybe ?? is it ??), as is promoting high-weight voters to mods, as is etc etc. Click through for more details, including math...
[treatise]
[link to the documented code]