First of all, thank you, Duncan, for this post. I feel like it captures important perspectives that I’ve had, and problems that I can see and puts them together in a pretty good way. (I also share your perspective that the post Could Be Better in several ways, but I respect you not letting the perfect be the enemy of the good.)
I find myself irritated right now (bothered, not angry) that our community’s primary method of highlighting quality writing is by karma-voting. It’s a similar kind of feeling to living in a democracy—yes, there are lots of systems that are worse, but really? Is this really the best we can do? (No particular shade on Ruby or the Lightcone team—making things is hard and I’m certainly glad LW exists and is as good as it is.)
Like, I think I have an idea that might make things substantially better that’s not terrible: make the standard signal for quality being a high price on a quality-arbitrated betting market. This is essentially applying the concept of Futarchy to internet forums (h/t ACX and Hanson). (If this is familiar to you, dear reader, feel free to skip to responses to this comment, where I talk about features of this proposal and other ideas.) Here’s how I could see it working:
When a user makes a post or comment or whatever, they also name a number between 0 and 100. This number is essentially a self-assessment of quality, where 0 means “I know this is flagrant trolling” and 100 means “This is obviously something that any interested party should read”. As an example, let’s say that I assign this comment an 80.
Now let’s say that you are reading and you see my comment and think “An 80? Bah! More like a 60!” You can then “downvote” the comment, which nudges the number down, or enter your own (numeric) estimate, which dramatically shifts the value towards your estimate (similar to a “strong” vote). Behind the scenes, the site tracks the disagreement. Each user is essentially making a bet around the true value of the post’s quality. (The downvote is a bet that it’s “less than 80″.) What are they betting? Reputation as judges! New users start 0 judge-of-quality-reputation, unless they get existing users to vouch for them and donate a bit of reputation. (We can call this “karma,” but I think it is very important to distinguish good-judge karma, from high-quality-writing karma!) When voting/betting on a post/comment, they stake some of that reputation (maybe 10% up to a cap of 50? (Just making up numbers here for the sake of clarity; I’d suggest actually running experiments)).
Then, you have the site randomly sample pieces of writing, weighting the sampling towards those that are most controversial (ie have the most reputation on the line). Have the site assign these pieces of writing to moderators whose sole job is to study that piece of writing and the surrounding context and to score its quality. (Perhaps you want multiple moderators. Perhaps there should be appeals, in the form of people betting against the value set by the moderator. Etc. More implementation details are needed.) That judgment then resolves all the bets, and results in users gaining/losing reputation.
Users who run out of reputation can’t actually bet, and so lose the ability to influence the quality-indicator. However, all people who place bets (or try to place bets when at zero/negative reputation) are subsidized a small amount of reputation just for participating. (This inflation is a feature, encouraging participation in the site.) Thus, even a new user without any vouch can build up ability to influence the signal by participating and consistently being right.
To my mind the primary features of this system that bear on Duncan’s top-level post are:
High-reputation judges can confidently set the quality signal for a piece of writing, even if they’re in the minority. The truth is not a popularity contest, even when it comes to quality.
The emphasis on betting means that people who “upvote” low-quality posts or “downvote” high-quality ones are punished, making “this made me feel things, and so I’m going to bandwagon” a dangerous mental move. And people who make this sort of move would be efficiently sidelined.
In concert, I expect that it would be much easier to bring concentrated force down on low-quality bits of writing. Which would, in turn, I think make the quality price/signal a much more meaningful piece of information, instead of the current karma score which is as others noted, is overloaded as a measure.
Nice. Thank you. How would you feel about me writing a top-level post reconsidering alternative systems and brainstorming/discussing solutions to the problems you raised?
Seems great! It’s a bit on ice this week, but we’ve been thinking very actively about changes to the voting system, and so right now is the right time to strike the iron if you want to change the teams opinion on how we should change things, and what we should experiment with.
My sense is that the basic UI interaction of “look at a price and judge it as wrong” has the potential to be surprisingly simple for a comment section. I often have intuitions that something is “overpriced” or “underpriced”.
But I find the grounding-out process pretty hard to swallow. I’d be spending so much of my time thinking about who was grounding it out and how to model them socially, which is a far more costly operation than my current one that’s just “do I think the karma number should go up or down”.
One obvious flaw with this proposal is that the quality-indicator would only be a measure of expected rating by a moderator. But who says that our moderators are the best judges of quality? Like, the scheme is ripe for corruption, and simply pushing the popularity contest one level up to a small group of elites.
One answer is that if you don’t like the mods, you can go somewhere else. Vote with your feet, etc.
A more turtles-all-the-way-down answer is that the stakeholders of LW (the users, and possibly influential community members/investors?) agree on an aggregate set of metrics for how well the moderators are collectively capturing quality. Then, for each unit of time (eg year) and each potential moderator, set up a conditional prediction market with real dollars on whether that person being a moderator causes the metrics to go up/down compared to the previous time unit. Hire the ones that people predict will be best for the site.
I guess the question is, what is the optimal amount of consensus. Where do we want to be, on the scale from Eternal September to Echo Chamber?
Seems the me that the answer depends on how much correct we are, on average. To emphasise: how much correct we actually are, not how much correct we want to be, or imagine ourselves to be.
On a website where moderators are correct about almost everything, most disagreement is a noise. (It may provide a valuable feedback on “what other people believe”, but not on how things actually are.) It is okay to punish disagreement, because in the rare situations where it is correct and you notice it, you can afford the karma hit for opposing the moderators. (And hopefully the moderators are smart enough to start paying attention when a member in good standing surprisingly decides to take a karma hit.)
On a website where moderators are quite often wrong, punishing disagreement means that the community will select for people who share the same biases, or who are good at reading the room.
I believe that people are likely to overestimate how much “other reasonable people” agree with them, which is why echo chambers can happen to people who genuinely see themselves as “open to other opinions, as long as those opinions are not obviously wrong (spoiler: most opinions you disagree with do seem obviously wrong)”. As a safety precaution against going too far in a positive feedback loop (because even if you believe that the moderators already go too far in some direction, the prediction voting incentivizes you to downvote all comments that point it out), there should be a mechanism to express thoughs that go against the moderator consensus. Like, a regular thread to say “I believe the moderators are wrong about X” without being automatically punished for being right. That is, a thread with special rules where moderators would upvote comments for being well-articulated without necessarily being correct.
I also want to note that this proposal isn’t mutually exclusive with other ideas, including other karma systems. It seems fine to have there be an additional indicator of popularity that is distinct from quality. Or, more to my liking, would be a button that simply marks that you thought a post was interesting and/or express gratitude towards the writer, without making a statement about how bulletproof the reasoning was. (This might help capture the essence of Rule Thinkers In, Not Out and reward newbies for posting.)
First of all, thank you, Duncan, for this post. I feel like it captures important perspectives that I’ve had, and problems that I can see and puts them together in a pretty good way. (I also share your perspective that the post Could Be Better in several ways, but I respect you not letting the perfect be the enemy of the good.)
I find myself irritated right now (bothered, not angry) that our community’s primary method of highlighting quality writing is by karma-voting. It’s a similar kind of feeling to living in a democracy—yes, there are lots of systems that are worse, but really? Is this really the best we can do? (No particular shade on Ruby or the Lightcone team—making things is hard and I’m certainly glad LW exists and is as good as it is.)
Like, I think I have an idea that might make things substantially better that’s not terrible: make the standard signal for quality being a high price on a quality-arbitrated betting market. This is essentially applying the concept of Futarchy to internet forums (h/t ACX and Hanson). (If this is familiar to you, dear reader, feel free to skip to responses to this comment, where I talk about features of this proposal and other ideas.) Here’s how I could see it working:
When a user makes a post or comment or whatever, they also name a number between 0 and 100. This number is essentially a self-assessment of quality, where 0 means “I know this is flagrant trolling” and 100 means “This is obviously something that any interested party should read”. As an example, let’s say that I assign this comment an 80.
Now let’s say that you are reading and you see my comment and think “An 80? Bah! More like a 60!” You can then “downvote” the comment, which nudges the number down, or enter your own (numeric) estimate, which dramatically shifts the value towards your estimate (similar to a “strong” vote). Behind the scenes, the site tracks the disagreement. Each user is essentially making a bet around the true value of the post’s quality. (The downvote is a bet that it’s “less than 80″.) What are they betting? Reputation as judges! New users start 0 judge-of-quality-reputation, unless they get existing users to vouch for them and donate a bit of reputation. (We can call this “karma,” but I think it is very important to distinguish good-judge karma, from high-quality-writing karma!) When voting/betting on a post/comment, they stake some of that reputation (maybe 10% up to a cap of 50? (Just making up numbers here for the sake of clarity; I’d suggest actually running experiments)).
Then, you have the site randomly sample pieces of writing, weighting the sampling towards those that are most controversial (ie have the most reputation on the line). Have the site assign these pieces of writing to moderators whose sole job is to study that piece of writing and the surrounding context and to score its quality. (Perhaps you want multiple moderators. Perhaps there should be appeals, in the form of people betting against the value set by the moderator. Etc. More implementation details are needed.) That judgment then resolves all the bets, and results in users gaining/losing reputation.
Users who run out of reputation can’t actually bet, and so lose the ability to influence the quality-indicator. However, all people who place bets (or try to place bets when at zero/negative reputation) are subsidized a small amount of reputation just for participating. (This inflation is a feature, encouraging participation in the site.) Thus, even a new user without any vouch can build up ability to influence the signal by participating and consistently being right.
To my mind the primary features of this system that bear on Duncan’s top-level post are:
High-reputation judges can confidently set the quality signal for a piece of writing, even if they’re in the minority. The truth is not a popularity contest, even when it comes to quality.
The emphasis on betting means that people who “upvote” low-quality posts or “downvote” high-quality ones are punished, making “this made me feel things, and so I’m going to bandwagon” a dangerous mental move. And people who make this sort of move would be efficiently sidelined.
In concert, I expect that it would be much easier to bring concentrated force down on low-quality bits of writing. Which would, in turn, I think make the quality price/signal a much more meaningful piece of information, instead of the current karma score which is as others noted, is overloaded as a measure.
I like this idea. It has a lot of nice attributes.
I wrote some in the past about what all the different things are that a voting/karma system on LW is trying to produce, with some thoughts on some proposals that feel a bit similar to this: https://www.lesswrong.com/posts/EQJfdqSaMcJyR5k73/habryka-s-shortform-feed?commentId=8meuqgifXhksp42sg
Nice. Thank you. How would you feel about me writing a top-level post reconsidering alternative systems and brainstorming/discussing solutions to the problems you raised?
Seems great! It’s a bit on ice this week, but we’ve been thinking very actively about changes to the voting system, and so right now is the right time to strike the iron if you want to change the teams opinion on how we should change things, and what we should experiment with.
I think this is too complex for a comment system, but upvoted for an interesting and original idea.
My sense is that the basic UI interaction of “look at a price and judge it as wrong” has the potential to be surprisingly simple for a comment section. I often have intuitions that something is “overpriced” or “underpriced”.
But I find the grounding-out process pretty hard to swallow. I’d be spending so much of my time thinking about who was grounding it out and how to model them socially, which is a far more costly operation than my current one that’s just “do I think the karma number should go up or down”.
But also strong upvoted for an exciting and original idea.
One obvious flaw with this proposal is that the quality-indicator would only be a measure of expected rating by a moderator. But who says that our moderators are the best judges of quality? Like, the scheme is ripe for corruption, and simply pushing the popularity contest one level up to a small group of elites.
One answer is that if you don’t like the mods, you can go somewhere else. Vote with your feet, etc.
A more turtles-all-the-way-down answer is that the stakeholders of LW (the users, and possibly influential community members/investors?) agree on an aggregate set of metrics for how well the moderators are collectively capturing quality. Then, for each unit of time (eg year) and each potential moderator, set up a conditional prediction market with real dollars on whether that person being a moderator causes the metrics to go up/down compared to the previous time unit. Hire the ones that people predict will be best for the site.
I guess the question is, what is the optimal amount of consensus. Where do we want to be, on the scale from Eternal September to Echo Chamber?
Seems the me that the answer depends on how much correct we are, on average. To emphasise: how much correct we actually are, not how much correct we want to be, or imagine ourselves to be.
On a website where moderators are correct about almost everything, most disagreement is a noise. (It may provide a valuable feedback on “what other people believe”, but not on how things actually are.) It is okay to punish disagreement, because in the rare situations where it is correct and you notice it, you can afford the karma hit for opposing the moderators. (And hopefully the moderators are smart enough to start paying attention when a member in good standing surprisingly decides to take a karma hit.)
On a website where moderators are quite often wrong, punishing disagreement means that the community will select for people who share the same biases, or who are good at reading the room.
I believe that people are likely to overestimate how much “other reasonable people” agree with them, which is why echo chambers can happen to people who genuinely see themselves as “open to other opinions, as long as those opinions are not obviously wrong (spoiler: most opinions you disagree with do seem obviously wrong)”. As a safety precaution against going too far in a positive feedback loop (because even if you believe that the moderators already go too far in some direction, the prediction voting incentivizes you to downvote all comments that point it out), there should be a mechanism to express thoughs that go against the moderator consensus. Like, a regular thread to say “I believe the moderators are wrong about X” without being automatically punished for being right. That is, a thread with special rules where moderators would upvote comments for being well-articulated without necessarily being correct.
I also want to note that this proposal isn’t mutually exclusive with other ideas, including other karma systems. It seems fine to have there be an additional indicator of popularity that is distinct from quality. Or, more to my liking, would be a button that simply marks that you thought a post was interesting and/or express gratitude towards the writer, without making a statement about how bulletproof the reasoning was. (This might help capture the essence of Rule Thinkers In, Not Out and reward newbies for posting.)