I don’t think it’s super productive to go into this with a ton of debt, but I do also think that voting is for expressing preferences, just that it’s better to model the preference as “on a scale from 1 to 1000, how good is this post?”, instead of “is this post good or bad?”. And you implement the former by upvoting if it is below your threshold, and downvoting if it is above, with the strong version being used when it’s particularly far away from where your assessment is. This gives you access to a bunch more data than if everyone just votes independently (i.e. voting independently results in posts just above the threshold for “good enough to strong-upvote” for a lot of users but to get the same karma as a post that is in the top 5 of all-time favorite posts for everyone who upvoted it).
In either case I am interested in an independent assessment, just that the assessment moves from “binary good/bad” to “numerical ordering of preferences”.
The problem with this view is that there does not seem to be any way to calibrate the scale. What should be the karma of a good post? A bad post? A mediocre one? What does 20 mean? What does 5 mean? Don’t the answers to these questions depend on how many users are voting on the post, and what their voting behavior is? Suppose you and I both hold the view you describe, but I think a good post should have 100 karma and you think a good post should have 300 karma—how should our voting behavior be interpreted? What does it mean, when a post ends up with, say, 75 karma? Do people think it’s good? Bad? Do we know?
This gets very complicated. It seems like the signal is degraded, not improved, by this.
i.e. voting independently results in posts just above the threshold for “good enough to strong-upvote” for a lot of users but to get the same karma as a post that is in the top 5 of all-time favorite posts for everyone who upvoted it
It seems to me like your perspective results in an improved signal only if everyone who votes has the same opinions on everything.
If people do not have the same opinions, then there will be a distribution across people’s “good enough to strong-upvote” thresholds; a post’s karma will then reflect its position along that distribution. A “top 5 all-time favorite for many people” will be “good enough to strong-upvote” for most people, and will have a high score. A “just good enough to upvote” post for many people, will cross that threshold for fewer, i.e. will be lower along that distribution, and will end up with a lower score. (In other words, you’re getting strong upvote × probability of strong upvote, summed across all voters.)
If everyone has the same opinion, then this will simply result in either everyone strong-upvoting it or no one strong-upvoting it—and in that case, my earlier concern about differently calibrated scales also does not apply.
So, your interpretation seems optimal if adopted by a user population with extremely homogeneous opinions. It is strongly sub-optimal, however, if adopted by a user population with a diverse range of opinions; in that scenario, the “votes independently indicate one’s own evaluation” interpretation is optimal.
I don’t think it’s super productive to go into this with a ton of debt, but I do also think that voting is for expressing preferences, just that it’s better to model the preference as “on a scale from 1 to 1000, how good is this post?”, instead of “is this post good or bad?”. And you implement the former by upvoting if it is below your threshold, and downvoting if it is above, with the strong version being used when it’s particularly far away from where your assessment is. This gives you access to a bunch more data than if everyone just votes independently (i.e. voting independently results in posts just above the threshold for “good enough to strong-upvote” for a lot of users but to get the same karma as a post that is in the top 5 of all-time favorite posts for everyone who upvoted it).
In either case I am interested in an independent assessment, just that the assessment moves from “binary good/bad” to “numerical ordering of preferences”.
The problem with this view is that there does not seem to be any way to calibrate the scale. What should be the karma of a good post? A bad post? A mediocre one? What does 20 mean? What does 5 mean? Don’t the answers to these questions depend on how many users are voting on the post, and what their voting behavior is? Suppose you and I both hold the view you describe, but I think a good post should have 100 karma and you think a good post should have 300 karma—how should our voting behavior be interpreted? What does it mean, when a post ends up with, say, 75 karma? Do people think it’s good? Bad? Do we know?
This gets very complicated. It seems like the signal is degraded, not improved, by this.
It seems to me like your perspective results in an improved signal only if everyone who votes has the same opinions on everything.
If people do not have the same opinions, then there will be a distribution across people’s “good enough to strong-upvote” thresholds; a post’s karma will then reflect its position along that distribution. A “top 5 all-time favorite for many people” will be “good enough to strong-upvote” for most people, and will have a high score. A “just good enough to upvote” post for many people, will cross that threshold for fewer, i.e. will be lower along that distribution, and will end up with a lower score. (In other words, you’re getting strong upvote × probability of strong upvote, summed across all voters.)
If everyone has the same opinion, then this will simply result in either everyone strong-upvoting it or no one strong-upvoting it—and in that case, my earlier concern about differently calibrated scales also does not apply.
So, your interpretation seems optimal if adopted by a user population with extremely homogeneous opinions. It is strongly sub-optimal, however, if adopted by a user population with a diverse range of opinions; in that scenario, the “votes independently indicate one’s own evaluation” interpretation is optimal.