So I’ve got to ask… do my posts not get voted up as much as the other regular posters’ because an upvote doesn’t seem to signal much, or because people actually don’t like my posts that much? Vote up if the former explanation, down if the latter.
I’ve often used voting to encourage posters that I like. Since I know that Eliezer has a long history of blogging, I don’t see him as needing the same level of encouragement that new posters might need, so I’m not always so quick to upvote his posts even when I think it is deserved.
I often catch myself using “other Eliezer posts” as the reference class for an Eliezer post, versus “posts in general” as the reference class for everyone else’s posts. That holds you to a much higher standard, especially since I best remember your early Overcoming Bias posts where you were picking off low-hanging fruit. It’s unfair to you and I’m trying to stop it. Anti-kibbitzer doesn’t work here because I go to new posts from the Recent Posts sidebar, plus your writing style’s hard to miss.
That’s generally fine, since I still get information out of which of my posts are being upvoted versus downvoted. I just have to know whether I should consider that signal commensurate with the signals other posts are getting (because if so that implies I should hurry up and finish this arc, then write less). But it sounds like the answer is no, on the whole.
Write shorter posts. Write in a simpler and less oracular prose style. And write more substantive posts—at times, it seems as if you believe your every passing thought deserves 2,000 words. I’ll often read your posts and, while recognizing some germ of a worthwhile idea there, regret the time and effort it took to locate it.
My impression is that your posts get voted up as much as anybody’s. Just look at the Karma score list. Look at the “Recent Posts”: 3, 4, 2, 22, 3, 5, 7, 18, 0, 25. You have the 22.
You might keep in mind the “Do I want my time back?” criterion that someone posted recently. Your posts are very long. Shorter posts will get more up-votes from people using that criterion.
Also, although I fear you will abuse this viewpoint, I think that everyone has their own “IQ window”, and they will down-vote posts that are either below or above their window.
To anyone who agrees with an upvote not signalling much: Please reconsider to what extent the value of upvoting is to communicate to the author vs. to other readers. One would assume that, eventually, we’d like LW to have a healthy population of people who haven’t necessarily read OB for years and may not be familiar with Eliezer’s previous work, so won’t realize the higher standard being applied.
Furthermore, I’m pretty sure that Eliezer is capable of reading the comments and comparing scores between articles, so holding him to a higher relative standard isn’t actually providing substantial additional feedback to him. Based on this, it seems to me that rating Eliezer differently than you would other authors is strictly suboptimal.
I can imagine an argument that holding him to a higher standard does provide more information, because he gets more information the closer to the probability of an upvote is to 50%. However, in practice I suspect this is an argument in favour of more upvotes, and in truth I’d be surprised if there isn’t a name for the cognitive bias about judging a thing against a narrower category even when you’re asked to judge it against a wider one.
Your idea about getting closer to 50% probability of an upvote in order to get more information identifies a weakness in the voting system. It doesn’t matter as much for comments, but I think it is inadequate for articles.
Much better than having to put every article into one of three categories—up, down, or neither—would be to have a slider that starts at 0 and can take values between −100 and +100. What we have now is equivalent to something like having −100 to −33.3 all mapped to ‘down’, −33.3 to +33.3 all mapped to neither, and +33.3 to +100 all mapped to ‘up’. Obviously, lots of information is being discarded by design.
Another problem is that votes aren’t normalized with respect to the user that cast the vote. An up vote from a user who rarely votes up should be worth more than one from someone who votes everything up.
Also, there could be distorting effects due to different subsets of readers preferentially reading different subsets of articles. If readers coming to LW without having read OB tend to vote differently (which is plausible since OB folks have not voted for years and may think of not voting up or down as the default, with a vote being for special emphasis), and they tend to read different sorts of articles (simpler articles on easier topics), the articles they read will appear to be wildly more popular.
The slider is an interesting notion. It adds user-interface complexity, and may have incentive problems for users who desire to exert control, but potentially garners a substantially more useful form of information.
At the moment the current score is a strong influence on how I vote on comments: I vote to move the score to the value I’d like it to have. This is somewhat unstable; directly specifying a personal score and taking a median would be less problematic.
The problem of the desire to exert control makes me think that a better option is giving a limited number of double/super/special votes that users can ration out as they see fit. Extra votes that actually mean something.
That’s a good idea. Though I didn’t say it originally, when I mentioned normalization of a vote with respect to the user that cast it, I meant not only that it should be normalized against the average rating of a vote for that user but also against how much the user votes in general—users who rate everything would then have less influence per vote than users who vote less frequently. If that were the case, then people who prefer to ration their votes and use them only for things they feel very strongly about (or have thought carefully about) would not have much less influence on what is popular and the direction of the site, as they currently do.
Having a slider requires a more-sophisticated data analysis, because different people use different rating scales. Typically psychologists use a multi-point scale, then use Rasch analysis (also called multi-item response theory) on the data.
I would say from my experience that a 5-point scale is not big enough; almost everything gets 3 or 4 points, except from the people (about 2% of raters) who binarize the scale by giving everything either a 1 or a 5. Also, people will not use negative ratings, so don’t try to center them on zero. People (or at least Americans) just can’t say “zero is average”.
My instinct would be to have the numbers not be visible to the user. You just have a rectangle with two colors, initially red on the right side and green on the left side. Clicking anywhere inside the rectangle changes the dividing line to be at that location. So clicking 90% of the way towards the right would make the left 90% be green and the right 10% be red. The backend would know that it corresponds to whatever number it corresponds to (+80 according to the scheme I gave earlier), but the user just has a qualitative feel for how much of the mass they’ve allocated to the good (green) color and how much to the bad (red) color.
As you hover over the rating button, the text below changes to indicate what that rating would mean. Zero stars means “don’t bother”, one star means “good enough to stay visible”, two stars means “above-average” and so on
Allow half stars for more information.
We would use percentile score to make the best use of the votes of binarizing voters without giving them more influence than high-information voters.
Amazon ranks stuff between ★☆☆☆☆ and ★★★★★ with a simple Javascript mouse hover / mouse click to set the value. LW could copy that pretty easily. I suggest that 5 categories would be enough.
I can imagine an argument that holding him to a higher standard does provide more information, because he gets more information the closer to the probability of an upvote is to 50%.
Well, let’s look.
The top scoring articles seem to rated in the 50-60 range, indicating at least 60 users who have voted. Eliezer’s articles seem to tend to be rated around 10-20, so that’s probably closer to a 30% chance of upvoting. As far as I could tell none of the top three rated posts are by Eliezer. Yvain seems to be the most consistently highly rated poster overall, with typical scores seemingly ranging from 20-40. Since Yvain roughly mimics Eliezer’s writing style and content, we could probably expect an unbiased rating of Eliezer’s posts to be similar. All around, as a very rough approximation, we can say that Eliezer’s posts are getting an upvote penalty of 50%.
Take all that as you will.
I’d be surprised if there isn’t a name for the cognitive bias about judging a thing against a narrower category even when you’re asked to judge it against a wider one.
I’d imagine there is a name. Whatever it is, I consistently fall prey to it with most intuitive self-evaluations (comparing myself mostly to groups of which I am not a representative member).
One thing you don’t mention is that Yvain’s posts and writing style are simpler and easier to comprehend than Eliezer’s. Yvain has also presented some posts on fairly basic topics that are probably familiar to most longtime OB readers but are new to readers just joining LW. [EDIT: I retract the last point. I was thinking of the ‘priming’ post and that there were others like this on basic heuristics and biases topics, but that seems like the only one.]
That is not to say that there’s not also some bias. I think many of us probably consciously or unconsciously hold Eliezer to much higher standards than anybody else.
All the recent talk about cults and cult-like behavior has probably made some people more hesitant to vote up anything by Eliezer as well.
One thing you don’t mention is that Yvain’s posts and writing style are simpler and easier to comprehend than Eliezer’s.
Not to be contrary, but I actually find Eliezer’s posts easier to comprehend, partly due to better structure and pacing, partly due to a typical slightly higher informational content holding my attention better. I suspect this is mostly a function of Eliezer having more practice, and of my own short attention span, heh.
I was going to say that I expect the cultishness discussion to be more directly relevant to the upvoting penalty, but looking quickly at post scores doesn’t seem to support that theory.
I’d be surprised if there isn’t a name for the cognitive bias about judging a thing against a narrower category even when you’re asked to judge it against a wider one.
It sounds like a form of availability bias, but I agree it needs a more precise term.
I choose option C, I don’t think your current posts are as important worth discussing as much as some other current posts.
The sequence on getting people to work together is only interesting if you are trying to form a specific type of fractious group. A group that cares about the world in 20-30 years will be very fractious because predicting the future is hard and has no particular methodology (and most people get it wrong) so most people will have different ideas of the future and hence different strategies for what should be done now.
Edit: You seem to be in a filler arc at the moment where as other people are starting their main sequences, to put it in anime terms.
I like the current season, as it were :-) - I’m very interested in group organising stuff and I think it’s important. I’m looking forward to the next season from the newer contributors—it often takes a season to find your stride...
I think you’d get higher ratings for more substantive posts, things in the vein of the posts on quantum physics, zombies, pebblesorters, etc.
Also, I consciously try to correct for bias in your favor, and I suspect others do the same (your posts are recognizable, even with Yvain imitating your style).
I think you’d get higher ratings for more substantive posts, things in the vein of the posts on quantum physics, zombies, pebblesorters, etc.
That should apply to everybody, not just Eliezer. I think you’re comparing Eliezer’s LW writings to his most substantive OB writings, but that’s not a standard that is applied to anybody else. LW is intentionally more casual and more tolerant of shorter, less substantive posts.
My point was not very clear. I realize you made a general prediction. What I mean is that if you made the prediction as a way of explaining the discrepancy, then it doesn’t explain it, because Eliezer’s posts are no less substantive than other posts with higher ratings, and if more substantive posts raised the scores of his posts, it would raise the scores of other posts too, and the disparity would remain—unless different standards are being applied to Eliezer than to others (such as other posts being rated relative to all LW posts as a whole, and Eliezer’s posts being rated relative to his OB posts as a whole).
Again, that’s fine! You don’t need to change anything! I just need to know whether LW is telling me to shut up or not. The relative data on which posts of mine people like more is still good.
A little of both I suspect. You had a bit of a quiet time there where you weren’t posting many ‘important’ posts while (for example) Yvain was letting out years of repressed blogging brilliance.
I definitely do the former, because I presumed that everyone knew you were God ± 10%, so I ought to vote to give you information on the utility of different styles, topics, and the like. If your karma vis-a-vis other posters is also significant, I suppose I’ll try to upvote you more than I do now, but still less than I do for others; otherwise I’d end up upvoting practically everything you write.
So I’ve got to ask… do my posts not get voted up as much as the other regular posters’ because an upvote doesn’t seem to signal much, or because people actually don’t like my posts that much? Vote up if the former explanation, down if the latter.
I’ve often used voting to encourage posters that I like. Since I know that Eliezer has a long history of blogging, I don’t see him as needing the same level of encouragement that new posters might need, so I’m not always so quick to upvote his posts even when I think it is deserved.
I often catch myself using “other Eliezer posts” as the reference class for an Eliezer post, versus “posts in general” as the reference class for everyone else’s posts. That holds you to a much higher standard, especially since I best remember your early Overcoming Bias posts where you were picking off low-hanging fruit. It’s unfair to you and I’m trying to stop it. Anti-kibbitzer doesn’t work here because I go to new posts from the Recent Posts sidebar, plus your writing style’s hard to miss.
I guess that counts as an upvote.
That’s generally fine, since I still get information out of which of my posts are being upvoted versus downvoted. I just have to know whether I should consider that signal commensurate with the signals other posts are getting (because if so that implies I should hurry up and finish this arc, then write less). But it sounds like the answer is no, on the whole.
The latter.
Write shorter posts. Write in a simpler and less oracular prose style. And write more substantive posts—at times, it seems as if you believe your every passing thought deserves 2,000 words. I’ll often read your posts and, while recognizing some germ of a worthwhile idea there, regret the time and effort it took to locate it.
My impression is that your posts get voted up as much as anybody’s. Just look at the Karma score list. Look at the “Recent Posts”: 3, 4, 2, 22, 3, 5, 7, 18, 0, 25. You have the 22.
You might keep in mind the “Do I want my time back?” criterion that someone posted recently. Your posts are very long. Shorter posts will get more up-votes from people using that criterion.
Also, although I fear you will abuse this viewpoint, I think that everyone has their own “IQ window”, and they will down-vote posts that are either below or above their window.
To anyone who agrees with an upvote not signalling much: Please reconsider to what extent the value of upvoting is to communicate to the author vs. to other readers. One would assume that, eventually, we’d like LW to have a healthy population of people who haven’t necessarily read OB for years and may not be familiar with Eliezer’s previous work, so won’t realize the higher standard being applied.
Furthermore, I’m pretty sure that Eliezer is capable of reading the comments and comparing scores between articles, so holding him to a higher relative standard isn’t actually providing substantial additional feedback to him. Based on this, it seems to me that rating Eliezer differently than you would other authors is strictly suboptimal.
I can imagine an argument that holding him to a higher standard does provide more information, because he gets more information the closer to the probability of an upvote is to 50%. However, in practice I suspect this is an argument in favour of more upvotes, and in truth I’d be surprised if there isn’t a name for the cognitive bias about judging a thing against a narrower category even when you’re asked to judge it against a wider one.
Your idea about getting closer to 50% probability of an upvote in order to get more information identifies a weakness in the voting system. It doesn’t matter as much for comments, but I think it is inadequate for articles.
Much better than having to put every article into one of three categories—up, down, or neither—would be to have a slider that starts at 0 and can take values between −100 and +100. What we have now is equivalent to something like having −100 to −33.3 all mapped to ‘down’, −33.3 to +33.3 all mapped to neither, and +33.3 to +100 all mapped to ‘up’. Obviously, lots of information is being discarded by design.
Another problem is that votes aren’t normalized with respect to the user that cast the vote. An up vote from a user who rarely votes up should be worth more than one from someone who votes everything up.
Also, there could be distorting effects due to different subsets of readers preferentially reading different subsets of articles. If readers coming to LW without having read OB tend to vote differently (which is plausible since OB folks have not voted for years and may think of not voting up or down as the default, with a vote being for special emphasis), and they tend to read different sorts of articles (simpler articles on easier topics), the articles they read will appear to be wildly more popular.
The slider is an interesting notion. It adds user-interface complexity, and may have incentive problems for users who desire to exert control, but potentially garners a substantially more useful form of information.
At the moment the current score is a strong influence on how I vote on comments: I vote to move the score to the value I’d like it to have. This is somewhat unstable; directly specifying a personal score and taking a median would be less problematic.
The problem of the desire to exert control makes me think that a better option is giving a limited number of double/super/special votes that users can ration out as they see fit. Extra votes that actually mean something.
That’s a good idea. Though I didn’t say it originally, when I mentioned normalization of a vote with respect to the user that cast it, I meant not only that it should be normalized against the average rating of a vote for that user but also against how much the user votes in general—users who rate everything would then have less influence per vote than users who vote less frequently. If that were the case, then people who prefer to ration their votes and use them only for things they feel very strongly about (or have thought carefully about) would not have much less influence on what is popular and the direction of the site, as they currently do.
Having a slider requires a more-sophisticated data analysis, because different people use different rating scales. Typically psychologists use a multi-point scale, then use Rasch analysis (also called multi-item response theory) on the data.
I would say from my experience that a 5-point scale is not big enough; almost everything gets 3 or 4 points, except from the people (about 2% of raters) who binarize the scale by giving everything either a 1 or a 5. Also, people will not use negative ratings, so don’t try to center them on zero. People (or at least Americans) just can’t say “zero is average”.
My instinct would be to have the numbers not be visible to the user. You just have a rectangle with two colors, initially red on the right side and green on the left side. Clicking anywhere inside the rectangle changes the dividing line to be at that location. So clicking 90% of the way towards the right would make the left 90% be green and the right 10% be red. The backend would know that it corresponds to whatever number it corresponds to (+80 according to the scheme I gave earlier), but the user just has a qualitative feel for how much of the mass they’ve allocated to the good (green) color and how much to the bad (red) color.
Two things you could do about that:
As you hover over the rating button, the text below changes to indicate what that rating would mean. Zero stars means “don’t bother”, one star means “good enough to stay visible”, two stars means “above-average” and so on
Allow half stars for more information.
We would use percentile score to make the best use of the votes of binarizing voters without giving them more influence than high-information voters.
Amazon ranks stuff between ★☆☆☆☆ and ★★★★★ with a simple Javascript mouse hover / mouse click to set the value. LW could copy that pretty easily. I suggest that 5 categories would be enough.
See PhilGoetz’s point below: “almost everything gets 3 or 4 points”.
Well, let’s look.
The top scoring articles seem to rated in the 50-60 range, indicating at least 60 users who have voted. Eliezer’s articles seem to tend to be rated around 10-20, so that’s probably closer to a 30% chance of upvoting. As far as I could tell none of the top three rated posts are by Eliezer. Yvain seems to be the most consistently highly rated poster overall, with typical scores seemingly ranging from 20-40. Since Yvain roughly mimics Eliezer’s writing style and content, we could probably expect an unbiased rating of Eliezer’s posts to be similar. All around, as a very rough approximation, we can say that Eliezer’s posts are getting an upvote penalty of 50%.
Take all that as you will.
I’d imagine there is a name. Whatever it is, I consistently fall prey to it with most intuitive self-evaluations (comparing myself mostly to groups of which I am not a representative member).
One thing you don’t mention is that Yvain’s posts and writing style are simpler and easier to comprehend than Eliezer’s. Yvain has also presented some posts on fairly basic topics that are probably familiar to most longtime OB readers but are new to readers just joining LW. [EDIT: I retract the last point. I was thinking of the ‘priming’ post and that there were others like this on basic heuristics and biases topics, but that seems like the only one.]
That is not to say that there’s not also some bias. I think many of us probably consciously or unconsciously hold Eliezer to much higher standards than anybody else.
All the recent talk about cults and cult-like behavior has probably made some people more hesitant to vote up anything by Eliezer as well.
Not to be contrary, but I actually find Eliezer’s posts easier to comprehend, partly due to better structure and pacing, partly due to a typical slightly higher informational content holding my attention better. I suspect this is mostly a function of Eliezer having more practice, and of my own short attention span, heh.
I was going to say that I expect the cultishness discussion to be more directly relevant to the upvoting penalty, but looking quickly at post scores doesn’t seem to support that theory.
It sounds like a form of availability bias, but I agree it needs a more precise term.
I choose option C, I don’t think your current posts are as important worth discussing as much as some other current posts.
The sequence on getting people to work together is only interesting if you are trying to form a specific type of fractious group. A group that cares about the world in 20-30 years will be very fractious because predicting the future is hard and has no particular methodology (and most people get it wrong) so most people will have different ideas of the future and hence different strategies for what should be done now.
Edit: You seem to be in a filler arc at the moment where as other people are starting their main sequences, to put it in anime terms.
I like the current season, as it were :-) - I’m very interested in group organising stuff and I think it’s important. I’m looking forward to the next season from the newer contributors—it often takes a season to find your stride...
I like your ‘anime terms’ explanation—I also pick option ‘C’, along with a bit of agreeing with Yvain.
I think you’d get higher ratings for more substantive posts, things in the vein of the posts on quantum physics, zombies, pebblesorters, etc.
Also, I consciously try to correct for bias in your favor, and I suspect others do the same (your posts are recognizable, even with Yvain imitating your style).
I think you’d get higher ratings for more substantive posts, things in the vein of the posts on quantum physics, zombies, pebblesorters, etc.
That should apply to everybody, not just Eliezer. I think you’re comparing Eliezer’s LW writings to his most substantive OB writings, but that’s not a standard that is applied to anybody else. LW is intentionally more casual and more tolerant of shorter, less substantive posts.
The cited text was a general prediction.
My point was not very clear. I realize you made a general prediction. What I mean is that if you made the prediction as a way of explaining the discrepancy, then it doesn’t explain it, because Eliezer’s posts are no less substantive than other posts with higher ratings, and if more substantive posts raised the scores of his posts, it would raise the scores of other posts too, and the disparity would remain—unless different standards are being applied to Eliezer than to others (such as other posts being rated relative to all LW posts as a whole, and Eliezer’s posts being rated relative to his OB posts as a whole).
Reading the RSS feed, it’s a significant extra step to vote. I’m more inclined to do that for new authors than those I already think highly of.
I also assumed that, as admin, your posts were automatically promoted. But maybe that’s something you only sometimes elect.
Since you’re using the data to judge reactions to your work, I hereby promise not to employ any (counter)-biasing strategy in praising you.
Again, that’s fine! You don’t need to change anything! I just need to know whether LW is telling me to shut up or not. The relative data on which posts of mine people like more is still good.
I think I hold you to a higher standard than others too.
A little of both I suspect. You had a bit of a quiet time there where you weren’t posting many ‘important’ posts while (for example) Yvain was letting out years of repressed blogging brilliance.
I definitely do the former, because I presumed that everyone knew you were God ± 10%, so I ought to vote to give you information on the utility of different styles, topics, and the like. If your karma vis-a-vis other posters is also significant, I suppose I’ll try to upvote you more than I do now, but still less than I do for others; otherwise I’d end up upvoting practically everything you write.