“I am currently experimenting with a karma system based on the concept of eigendemocracy by Scott Aaronson, which you can read about here, but which basically boils down to applying Google’s PageRank algorithm to karma allocation.”
This won’t work, for the same reason PageRank did not work, you can game it by collusion. Communities are excellent at collusion. I think the important thing to do is making toxic people (defined in a socially constructed way as people you don’t want around) go away. Ranking posts from best to worst in folks who remain I don’t think is that helpful. People will know quality without numbers.
“This won’t work, for the same reason PageRank did not work”
I am very confused by this. Google’s search vastly outperformed its competitors with PageRank and is still using a heavily tweaked version of PageRank to this day, delivering by far the best search on the market. It seems to me that PageRank should widely be considered to be the most successful reputation algorithm that has ever been invented, having demonstrated extraordinary real-world success. In what way does it make sense to say “PageRank did not work”?
Google is using a much more complicated algorithm that is constantly tweaked, and is a trade secret—precisely because as soon as it became profitable to do so, the ecosystem proceeded to game the hell out of PageRank.
Google hasn’t been using PageRank-as-in-the-paper for ages. The real secret sauce behind Google is not eigenvalues, it’s the fact that it’s effectively anti-inductive, because the algorithm isn’t open and there is an army of humans looking for attempts to game it, and modifying it as soon as such an attempt is found.
Given that, it seems equally valid to say “this will work, for the same reason that PageRank worked”, i.e., we can also tweak the reputation algorithm as people try to attack it. We don’t have as much resources as Google, but then we also don’t face as many attackers (with as strong incentives) as Google does.
I personally do prefer a forum with karma numbers, to help me find quality posts/comments/posters that I would likely miss or have to devote a lot of time and effort to sift through.
I am not really trying to be negative for the sake of being negative here, I am trying to correctly attribute success to the right thing. People get “halo effect” in their head because “eigenvalues” sound nice and clean.
Reputation systems, though, aren’t the type of problem that linear algebra will solve for you. And this isn’t too surprising. People are involved with reputation systems, and people are far too complex for linear algebra to model properly.
people are far too complex for linear algebra to model properly
True, but not particularly relevant. Reputation systems like karma will not solve the problem of who to trust or who to pay attention to—but they are not intended to. Their task is to be merely helpful to humans navigating the social landscape. They do not replace networking, name recognition, other reputation measures, etc.
I think votes have served several useful purposes.
Downvotes have been a very good way of enforcing the low-politics norm.
When there’s lots of something, you often want to sort by votes, or some ranking that mixes votes and age. Right now there aren’t many comments per thread, but if there were 100 top-level comments, I’d want votes. Similarly, as a new reader, it was very helpful to me to look for old posts that people had rated highly.
Curious as to why you think that LW2.0 will have a problem with gaming karma when LW1.0 hasn’t had such a problem (unless you count Eugine, and even if you do, we’ve been promised the tools for dealing with Eugines now).
I think this roughly summarizes my perspective on this. Karma seems to work well for a very large range of online forums and applications. We didn’t really have any problems with collusion on LW outside of Eugine, and that was a result of a lack of moderator tools, not a problem with the karma system itself.
I agree that you should never fully delegate your decision making process to a simple algorithm, that’s what the value-loading problem is all about, but that’s what we have moderators and admins for. If we see suspicious behavior in the voting patterns we investigate and if we find someone is gaming the system we punish them. This is how practically all social rules and systems get enforced.
LW1.0′s problem with karma is that karma isn’t measuring anything useful (certainly not quality). How can a distributed voting system decide on quality? Quality is not decided by majority vote.
The biggest problem with karma systems is in people’s heads—people think karma does something other than what it does in reality.
Higher-voted comments are consistently more insightful and interesting than low-voted ones.
This was also my experience (on LW) several years ago, but not recently. On Reddit, I don’t see much difference between highly- and moderately-upvoted comments, only poorly-upvoted comments (in a popular thread) are consistently bad.
aggregating lots of individual estimates of quality sure can help discover the quality.
I guess we fundamentally disagree. Lots of people with no clue about something aren’t going to magically transform into a method for discerning clue regardless of aggregation method—garbage in garbage out. For example: aggregating learners in machine learning can work, but requires strong conditions.
Do you disagree with Kaj that higher-voted comments are consistently more insightful and interesting than low-voted ones?
It sounds like you are making a different point: that no voting system is a substitute for having a smart, well-informed userbase. While that is true, that is also not really the problem that a voting system is trying to solve.
Sure do. On stuff I know a little about, what gets upvoted is “LW folk wisdom” or perhaps “EY’s weird opinions” rather than anything particularly good. That isn’t surprising. Karma, being a numerical aggregate of the crowd, is just spitting back a view of the crowd on a topic. That is what karma does—nothing to do with quality.
Every crowd thinks they are such a place where it’s actually true.
Some of the extreme sceptics do not believe they are much closer to the truth than anyone else.
Outside view: they are wrong.
There does not exist a group such that consensus of the group is highly correlated with truth? That’s quite an extraordinary claim you’re making; do you have the appropriate evidence?
I think Ilya is not claiming that no such group exists but that it is well nigh impossible to know that your group is one such. At least where the claim is being made very broadly, as it seems to be upthread. I don’t think it’s unreasonable for experimental physicists to think that their consensus on questions of experimental physics is strongly correlated with truth, for instance, and I bet Ilya doesn’t either.
More specifically, I think the following claim is quite plausible: When a group of people coalesces around some set of controversial ideas (be they political, religious, technological, or whatever), the correlation between group consensus and truth in the area of those controversial ideas may be positive or negative or zero, and members of the group are typically ill-equipped to tell which of these cases they’re in.
LW has the best epistemic hygiene of all the communities I’ve encountered and/or participated in.
In so far as epistemic hygiene is positively correlated with truth, I expect LW consensus to be more positively correlated with truth than most (not all) other internet communities.
Talking about LW, specifically. Presumably, groups exist that truth-track, for example experts on their area of expertise. LW isn’t an expert group.
The prior on LW is the same as on any other place on the internet, it’s just a place for folks to gab. If LW were extraordinary, truth-wise, they would be sitting on an enormous pile of utility.
The prior on LW is the same as on any other place on the internet.
I disagree. Epistemic hygiene is genuinely better on LW, and insofar as Epistemic hygiene is positively correlated with truth, I expect LW consensus to be more positively correlated with truth than most (not all) other internet communities.
Presumably, groups exist that truth-track, for example experts on their area of expertise.
A group of experts will not necessarily truth-track—there are a lot of counterexamples from gender studies to nutrition.
I would probably say that a group which implements its ideas in practice and is exposed to the consequences is likely to truth-track. That’s not LW, but that’s not most of the academia either.
Lots of people with no clue about something aren’t going to magically transform into a method for discerning clue regardless of aggregation method—garbage in garbage out.
I think that’s the core of the disagreement: I assume that if the forum is worth reading in the first place, then the average forum user’s opinion of a comment’s quality tends to correlate with my own. In which case something have lots of upvotes is evidence in favor of me also thinking that it will be a good comment.
This assumption does break down if you assume that the other people have “no clue”, but if that’s your opinion of a forum’s users, then why are you reading that forum in the first place?
“Clue” is not a total ordering of people from best to worst, it varies from topic to topic.
The other issue to consider is what you view the purpose of a forum is.
Consider a subreddit like TheDonald. Presumably they may use karma to get consensus on what a good comment is, also. But TheDonald is an echo chamber. If your opinions are very correlated with opinions of others in a forum, then naturally you get a number that tells you what everyone agrees is good.
That can be useful, sometimes. But this isn’t quality, it’s just community consensus, and that can be arbitrarily far off. “Less wrong,” as-is-written-on-the-tin is supposedly about something more objective than just coming to a community consensus. You need true signal for that, and karma, being a mirror a community holds to itself, cannot give it to you.
edit: the form of your question is: “if you don’t like TheDonald, why are you reading TheDonald?” Is that what you want to be saying?
Hopefully this question is not too much of a digression—but has anyone considered using something like Arxiv-Sanity but instead of for papers it could include content (blog posts, articles, etc.) produced by the wider rationality community? Because at least with that you are measuring similarity to things you have already read and liked, things other people have read and liked, or things people are linking to and commenting on, and you can search things pretty well based on content and authorship. Ranking things by (what people have stored in their library and are planning on taking time to study) might contain more information than karma.
Karma serves as an indicator of the reception that certain content got. High karma means several people liked it. Negative karma means it was very disliked, etc.
How are you going to prevent gaming the system and collusion?
Keep tweaking the rules until you’ve got a system where the easiest way to get karma is to make quality contributions?
There probably exist karma systems which are provably non-gameable in relevant ways. For example, if upvotes are a conserved quantity (i.e. by upvoting you, I give you 1 upvote and lose 1 of my own upvotes), then you can’t manufacture them from thin air using sockpuppets.
However, it also seems like for a small community, you’re probably better off just moderating by hand. The point of a karma system is to automatically scale moderation up to a much larger number of people, at which point it makes more sense to hash out details. In other worse, maybe I should go try to get a job on reddit’s moderator tools team.
Keep tweaking the rules until you’ve got a system where the easiest way to get karma is to make quality contributions?
This will never ever work. Predicting this in advance.
There probably exist karma systems which are provably non-gameable in relevant ways.
You should tell Google and academia, they will be most interested in your ideas. Don’t you think people already thought very hard about this? This is such a typical LW attitude.
Don’t you think people already thought very hard about this?
Can you show me 3 peer-reviewed papers which discuss discussion site karma systems that differ meaningfully from reddit’s, and 3 discussion sites that implement karma systems that differ from reddit’s in interesting ways? If not, it seems like a neglected topic to me.
Maybe I’m just not very good at doing literature searches. I did a search on Google Scholar for “reddit karma” and found only one paper which focuses on reddit karma. It’s got brilliant insights such as
The aforementioned conflict between idealistically and quantitatively motivated contributions has however led to a discrepancy between value assessments of content.
...
This is such a typical LW attitude.
I believe Robin Hanson when he says academics neglect topics if they are too weird-seeming. Do you disagree?
It’s certainly plausible that there is academic research relevant to the design of karma systems, but I don’t see why the existence of such research is a compelling reason to not spend 5 minutes thinking about the question from first principles on my own. Relevant quote.
Coincidentally, just a couple days ago I was having a conversation with a math professor here at UC Berkeley about the feasibility of doing research outside of academia. The professor’s opinion was that this is very difficult to do in math, because math is a very “vertical” field where you have to climb to the top before making a contribution, and as long as you are going to spend half a decade or more climbing to the top, you might as well do so within the structure of academia. However, the professor did not think this was true of computer science (see: stuff like Bitcoin which did not come out of academia).
Maybe I’m just not very good at doing literature searches. I did a search on Google Scholar for “reddit karma” and found only one paper which focuses on reddit karma.
You can’t do lit searches with google. Here’s one paper with a bunch of references on attacks on reputation systems, and reputation systems more generally:
You are right that lots of folks outside of academia do research on this, in particular game companies (due to toxic players in multiplayer games). This is far from a solved problem—Valve, Riot and Blizzard spend an enormous amount of effort on reputation systems.
I don’t see why the existence of such research is a compelling reason to not spend 5 minutes thinking about the question from first principles on my own.
I don’t think there is a way to write this in a way that doesn’t sound mean: because you are an amateur. Imo, the best way for amateurs to proceed is to (a) trust experts, (b) read expert stuff, and (c) mostly not talk. Changes are, your 5 minute thoughts on the matter are only adding noise to the discussion. In principle, taking expert consensus as the prior is a part of rationality. In practice, people ignore this part because it is not a practice that is fun to follow. It’s much more fun to talk than to read papers.
LW’s love affair with amateurism is one of the things I hate most about its culture.
My favorite episode in the history of science is how science “forgot” what the cure of scurvy was. In order for human civilization not to forget things, we need to be better about (a), (b), (c) above.
What expert consensus are you referring to? I see an unsolved engineering problem, not an expert consensus.
My view of amateurism has been formed, in a large part, from reading experts on the topic:
The clash of domains is a particularly fruitful source of ideas. If you know a lot about programming and you start learning about some other field, you’ll probably see problems that software could solve. In fact, you’re doubly likely to find good problems in another domain: (a) the inhabitants of that domain are not as likely as software people to have already solved their problems with software, and (b) since you come into the new domain totally ignorant, you don’t even know what the status quo is to take it for granted.
Introspection, and an examination of history and of reports of those who have done great work, all seem to show typically the pattern of creativity is as follows. There is first the recognition of the problem in some dim sense. This is followed by a longer or shorter period of refinement of the problem. Do not be too hasty at this stage, as you are likely to put the problem in the conventional form and find only the conventional solution.
Synthesize new ideas constantly. Never read passively. Annotate, model, think, and synthesize while you read, even when you’re reading what you conceive to be introductory stuff.
This past summer I was working at a startup that does predictive maintenance for internet-connected devices. The CEO has a PhD from Oxford and did his postdoc at Stanford, so probably not an amateur. But working over the summer, I was able to provide a different perspective on the problems that the company had been thinking about for over a year, and a big part of the company’s proposed software stack ended up getting re-envisioned and written from scratch, largely due to my input. So I don’t think it’s ridiculous for me to wonder whether I’d be able to make a similar contribution at Valve/Riot/Blizzard.
The main reason I was able to contribute as much as I did was because I had the gumption to consider the possibility that the company’s existing plans weren’t very good. Basically by going in the exact opposite direction of your “amateurs should stay humble” advice.
Here are some more things I believe:
If you’re solving a problem that is similar to a problem that has already been solved, but is not an exact match, sometimes it takes as much effort to re-work an existing solution as to create a new solution from scratch.
Noise is a matter of place. A comment that is brilliant by the standards of Yahoo Answers might justifiably be downvoted on Less Wrong. It doesn’t make sense to ask that people writing comments on LW try to reach the standard of published academic work.
In computer science, industry is often “ahead” of academia in the sense that important algorithms get discovered in industry first, then academics discover them later and publish their results.
(a) They also laughed at Bozo the Clown. (I think this is Carl Sagan’s quote).
(b) Outside view: how often do outsiders solve a problem in a novel way, vs just adding noise and cluelessness to the discussion? Base rates! Again, nothing that I am saying is controversial, having good priors is a part of “rationality folklore” already. Going with expert consensus as a prior is a part of “rationality folklore” already. It’s just that people selectively follow rationality practices only when they are fun to follow.
(c) “In computer science, industry is often “ahead” of academia in the sense that important algorithms get discovered in industry first”
Yes, this sometimes happens. But again, base rates. Google/Facebook is full of academia-trained PhDs and ex-professors, so the line here is unclear. It’s not amateurs coming up with these algorithms. John Tukey came up with the Fast Fourier Transform while at Bell Labs, but he was John Tukey, and had a math PhD from Princeton.
Changes are, your 5 minute thoughts on the matter are only adding noise to the discussion.
This is where we differ; I think the potential for substantial contribution vastly outweighs any “noise” that may be be caused by amateurs taking stabs at he problem. I do not think all the low hanging fruit are gone (and if they were, how would we know so?), I think that amateurs are capable of substantial contributions in several fields. I think that optimism towards open problems is a more productive attitude.
I support “LW’s love affair with amateurism”, and it’s a part of the culture I wouldn’t want to see disappear.
You should tell Google and academia, they will be most interested in your ideas. Don’t you think people already thought very hard about this? This is such a typical LW attitude.
This reply contributes nothing to the discussion of the problem at hand, and is quite uncharitable, I hope such replies were discouraged, and if downvoting was enabled, I would have downvoted it.
If thinking that they can solve the problem at hand (and making attempts at it) is a “typical LW attitude”, then it is an attitude I want to see more of and believe should be encouraged (thus, I’ll be upvoting /u/John_Maxwell_IV ’s post). A priori assuming that one cannot solve a problem (that hasn’t been proven/isnt known to be unsolvable) and thus refraining from even attempting the problem, isn’t an attitude that I want to see become the norm in Lesswrong. It’s not an attitude that I think is useful, productive, optimal or efficient.
It is my opinion, that we want to encourage people to attempt problems of interest to the community (the potential benefits are vast (e.g the problem is solved, and/or significant improvements are made on the problem, and future endeavours would have a better starting point), and the potential demerits are of lesser impact (time (ours and whoever attempts it) is wasted on an unpromising solution).
Coming back to the topic that was being discussed, I think methods of costly signalling are promising (for example, when you upvote a post you transfer X karma to the user, and you lose k*X (k < 1)).
I have been here for a few years, I think my model of “the LW mindset” is fairly good.
I suppose the general thing I am trying to say is: “speak less, read more.” But at the end of the day, this sort of advice is hopelessly entangled with status considerations. So it’s hard to give to a stranger, and have it be received well. Only really works in the context of an existing apprenticeship relationship.
(“A priori” suggests lack of knowledge to temper an initial impression, which doesn’t apply here.)
There are problems one can’t by default solve, and a statement, standing on its own, that it’s feasible to solve them is known to be wrong. A “useful attitude” of believing something wrong is a popular stance, but is it good? How does its usefulness work, specifically, if it does, and can we get the benefits without the ugliness?
that hasn’t been proven/isnt known to be unsolvable)
An optimistic attitude towards problems that are potentially solvable is instrumentally useful—and dare I argue—instrumentally rational. The drawbacks of encouraging an optimistic attitude towards open problems are far outweighed by the potential benefits.
(The quote markup in your comment designates a quote from your earlier comment, not my comment.)
You are not engaging the distinction I’ve drawn. Saying “It’s useful” isn’t the final analysis, there are potential improvements that avoid the horror of intentionally holding and professing false beliefs (to the point of disapproving of other people pointing out their falsehood; this happened in your reply to Ilya).
The problem of improving over the stance of an “optimistic attitude” might be solvable.
(The quote markup in your comment designates a quote from your earlier comment, not my comment.)
I know: I was quoting myself.
Saying “It’s useful” isn’t the final analysis
I guess for me it is.
there are potential improvements that avoid the horror of intentionally holding and professing false beliefs (to the point of disapproving of other people pointing out their falsehood; this happened in your reply to Ilya)
The beliefs aren’t known to be false. It is not clear to me, that someone believing they can solve a problem (that isn’t known/proven or even strongly suspected to be unsolvable) is a false belief.
What do you propose to replace the optimism I suggest?
Moderation is basically the only way, I think. You could try to use fancy pagerank-anchored-by-trusted-users ratings, or make votes costly to the user in some way, but I think moderation is the necessary fallback.
Goodhart’s law is real, but people still try to use metrics. Quality may speak for itself, but it can be too costly to listen to the quality of every single thing anyone says.
I can use name recognition to scroll through a comment thread to find all the comments by the people that I consider in high regard, but this is much more effort than just having a karma system which automatically shows the top-voted comments first. (The karma system also doesn’t discriminate against new writers as badly as relying on name recognition does.)
Going to reply to this because I don’t think it should be overlooked. It’s a valid point—people tend to want to filter out information that’s not from the sources they trust. I think these kind of incentive pressures are what led to the “LessWrong Diaspora” being concentrated around specific blogs belonging to people with very positive reputation such as Scott Alexander. And when people want to look at different sources of information they will follow the advice of said people usually. This is how I operate when I’m doing my own reading / research—I start somewhere I consider to be the “safest” and move out from there according to the references given at that spot and perhaps a few more steps outward.
When we use a karma / voting system, we are basically trying to calculate P(this contains useful information | this post has a high number of votes) but no voting system ever offers as much evidence as a specific reference from someone we recognize as trustworthy. The only way to increase the evidence gained from a voting system is to add further complexity to the system by increasing the amount of information contained in a vote, either by weighing the votes or by identifying the person behind the vote. And then from there you can add more to a vote, like a specific comment or a more nuanced judgement. I think the end of that track is basically what we have now, blogs by a specific person linking to other blogs, or social media like Facebook where no user is anonymous and everyone has their information filtered in some way.
Essentially I’m saying we should not ignore the role that optimization pressure has played in producing the systems we already have.
Quality may speak for itself, but it can be too costly to listen to the quality of every single thing anyone says.
Which is why there should be a way to vote on users, not content, the quantity of unevaluated content shouldn’t divide the signal. This would matter if the primary mission succeeds and there is actual conversation worth protecting.
Ranking posts from best to worst in folks who remain I don’t think is that helpful. People will know quality without numbers.
Ranking helps me know what to read.
The SlateStarCodex comments are unusable for me because nothing is sorted by quality, so what’s at the top is just whoever had the fastest fingers and least filter.
Maybe this isn’t a problem for fast readers (I am a slow reader), but I find automatic sorting mechanisms to be super useful.
This. SSC comments I basically only read if there are very few of them, because of the lack of karma; on LW even large discussions are actually readable, thanks to karma sorting.
Over the years I’ve gone through periods of time where I can devote the effort/time to thoroughly reading LW and periods of time where I can basically just skim it.
Because of this I’m in a good position to judge the reliability of karma in surfacing content for its readability.
My judgement is that karma strongly correlates with readability.
Oli and I disagree somewhat on voting systems. I think you get a huge benefit from doing voting at all, a small benefit from doing simple weighted voting (including not allowing people below ~10 karma to vote), and then there’s not much left from complicated vote weighting schemes (like eigenkarma or so on). Part of this is because more complicated systems don’t necessarily have more complicated gaming mechanics.
There are empirical questions involved; we haven’t looked at, for example, the graph of what karma converges to if you use my simplistic vote weighting scheme vs. an eigenkarma scheme, but my expectation is a very high correlation. (I’d be very surprised if it were less than .8, and pretty surprised if it were less than .95.)
I expect the counterfactual questions—”how would Manfred have voted if we were using eigenkarma instead of simple aggregation?”—to not make a huge difference in practice, altho they may make a difference for problem users.
Main benefits to karma are feedback for writers (both informative and hedonic) and sorting for attention conservation. Main costs are supporting the underlying tech, transparency / explaining the system, and dealing with efforts to game it.
(For example, if we just clicked a radio button and we had eigenkarma, I would be much more optimistic about it. As is, there are other features I would much rather have.)
Yeah I agree that people need to weigh experts highly. LW pays lipservice to this, but only that—basically as soon as people have a strong opinion experts get discarded. Started with EY.
My impression of how to do this is to give experts an “as an expert, I...” vote. So you could see that a post has 5 upvotes and a beaker downvote, and say “hmm, the scientist thinks this is bad and other people think it’s good.”
Multiple flavors lets you separate out different parts of the comment in a way that’s meaningfully distinct from the Slashdot-style “everyone can pick a descriptor;” you don’t want everyone to be able to say “that’s funny,” just the comedians.
This works somewhat better than simple vote weighting because it lets people say whether they’re doing this as just another reader or ‘in their professional capacity;’ I want Ilya’s votes on stats comments to be very highly weighted and I want his votes on, say, rationality quotes to be weighted roughly like anyone else’s.
Of course, this sketch has many problems of its own. As written, I lumped many different forms of expertise into “scientist,” and you’re trusting the user to vote in the right contexts.
If you have a more-legible quality signal (in the James C. Scott sense of “legibility”), and a less-legible quality signal, you will inevitably end up using the more-legible quality signal more, and the less-legible one will be ignored—even if the less-legible one is tremendously more accurate and valuable.
Your suggestion is not implausible on its face, but the devil is in the details. No doubt you know this, as you say “this sketch has many problems of its own”. But these details and problems conspire to make such a formalized version of the “expert’s vote” either substantially decoupled from what it’s supposed to represent, or not nearly as legible as the simple “people’s vote”. In the former case, what’s the point? In the latter case, the result is that the “people’s vote” will remain much more influential on visibility, ranking, inclusion in canon, contribution to a member’s influence in various ways, and everything else you might care to use such formalized rating numbers for.
The question of reputation, and of whose opinion to trust and value, is a deep and fundamental one. I don’t say it’s impossible to algorithmize, but if possible, it is surely quite difficult. And simple karma (based on unweighted votes) is, I think, a step in the wrong direction.
Vaniver, I sympathize with the desire to automate figuring out who experts are via point systems, but consider that even in academia (with a built-in citation pagerank), people still rely on names. That’s evidence about pagerank systems not being great on their own. People game the hell out of citations.
Probably should weigh my opinion of rationality stuff quite low, I am neither a practitioner nor a historian of rationality. I have gotten gradually more pessimistic about the whole project.
From context, it’s clearly (conditional on the feature being there at all) “someone accepted by the administrators of the site as an expert”. How they make that determination would be up to them; I would hope that (again, conditional on the thing happening at all) they would err on the side of caution and accept people as experts only in cases where few reasonable people would disagree.
(a) Thanks for making the effort!
(b)
“I am currently experimenting with a karma system based on the concept of eigendemocracy by Scott Aaronson, which you can read about here, but which basically boils down to applying Google’s PageRank algorithm to karma allocation.”
This won’t work, for the same reason PageRank did not work, you can game it by collusion. Communities are excellent at collusion. I think the important thing to do is making toxic people (defined in a socially constructed way as people you don’t want around) go away. Ranking posts from best to worst in folks who remain I don’t think is that helpful. People will know quality without numbers.
“This won’t work, for the same reason PageRank did not work”
I am very confused by this. Google’s search vastly outperformed its competitors with PageRank and is still using a heavily tweaked version of PageRank to this day, delivering by far the best search on the market. It seems to me that PageRank should widely be considered to be the most successful reputation algorithm that has ever been invented, having demonstrated extraordinary real-world success. In what way does it make sense to say “PageRank did not work”?
FWIW, I worked at Google about a decade ago, and even then, PageRank was basically no longer used. I can’t imagine it’s gotten more influence since.
It did work, but I got the strong sense that it no longer worked.
Google is using a much more complicated algorithm that is constantly tweaked, and is a trade secret—precisely because as soon as it became profitable to do so, the ecosystem proceeded to game the hell out of PageRank.
Google hasn’t been using PageRank-as-in-the-paper for ages. The real secret sauce behind Google is not eigenvalues, it’s the fact that it’s effectively anti-inductive, because the algorithm isn’t open and there is an army of humans looking for attempts to game it, and modifying it as soon as such an attempt is found.
Given that, it seems equally valid to say “this will work, for the same reason that PageRank worked”, i.e., we can also tweak the reputation algorithm as people try to attack it. We don’t have as much resources as Google, but then we also don’t face as many attackers (with as strong incentives) as Google does.
I personally do prefer a forum with karma numbers, to help me find quality posts/comments/posters that I would likely miss or have to devote a lot of time and effort to sift through.
It’s not PageRank that worked, it’s anti-induction that worked. PageRank did not work, as soon as it faced resistance.
You really are a “glass half empty” kind of guy aren’t you.
I am not really trying to be negative for the sake of being negative here, I am trying to correctly attribute success to the right thing. People get “halo effect” in their head because “eigenvalues” sound nice and clean.
Reputation systems, though, aren’t the type of problem that linear algebra will solve for you. And this isn’t too surprising. People are involved with reputation systems, and people are far too complex for linear algebra to model properly.
True, but not particularly relevant. Reputation systems like karma will not solve the problem of who to trust or who to pay attention to—but they are not intended to. Their task is to be merely helpful to humans navigating the social landscape. They do not replace networking, name recognition, other reputation measures, etc.
I think votes have served several useful purposes.
Downvotes have been a very good way of enforcing the low-politics norm.
When there’s lots of something, you often want to sort by votes, or some ranking that mixes votes and age. Right now there aren’t many comments per thread, but if there were 100 top-level comments, I’d want votes. Similarly, as a new reader, it was very helpful to me to look for old posts that people had rated highly.
How are you going to prevent gaming the system and collusion?
Goodhart’s law: you can game metrics, you can’t game targets. Quality speaks for itself.
Curious as to why you think that LW2.0 will have a problem with gaming karma when LW1.0 hasn’t had such a problem (unless you count Eugine, and even if you do, we’ve been promised the tools for dealing with Eugines now).
I think this roughly summarizes my perspective on this. Karma seems to work well for a very large range of online forums and applications. We didn’t really have any problems with collusion on LW outside of Eugine, and that was a result of a lack of moderator tools, not a problem with the karma system itself.
I agree that you should never fully delegate your decision making process to a simple algorithm, that’s what the value-loading problem is all about, but that’s what we have moderators and admins for. If we see suspicious behavior in the voting patterns we investigate and if we find someone is gaming the system we punish them. This is how practically all social rules and systems get enforced.
LW1.0′s problem with karma is that karma isn’t measuring anything useful (certainly not quality). How can a distributed voting system decide on quality? Quality is not decided by majority vote.
The biggest problem with karma systems is in people’s heads—people think karma does something other than what it does in reality.
That’s the exact opposite of my experience. Higher-voted comments are consistently more insightful and interesting than low-voted ones.
Obviously not decided by it, but aggregating lots of individual estimates of quality sure can help discover the quality.
This was also my experience (on LW) several years ago, but not recently. On Reddit, I don’t see much difference between highly- and moderately-upvoted comments, only poorly-upvoted comments (in a popular thread) are consistently bad.
I guess we fundamentally disagree. Lots of people with no clue about something aren’t going to magically transform into a method for discerning clue regardless of aggregation method—garbage in garbage out. For example: aggregating learners in machine learning can work, but requires strong conditions.
Do you disagree with Kaj that higher-voted comments are consistently more insightful and interesting than low-voted ones?
It sounds like you are making a different point: that no voting system is a substitute for having a smart, well-informed userbase. While that is true, that is also not really the problem that a voting system is trying to solve.
Sure do. On stuff I know a little about, what gets upvoted is “LW folk wisdom” or perhaps “EY’s weird opinions” rather than anything particularly good. That isn’t surprising. Karma, being a numerical aggregate of the crowd, is just spitting back a view of the crowd on a topic. That is what karma does—nothing to do with quality.
What if the view of the crowd is correlated with quality.
Every crowd thinks so.
I think Lesswrong might be (or at the very least was once) such a place where this is actually true.
Every crowd thinks they are such a place where it’s actually true. Outside view: they are wrong.
Some of the extreme sceptics do not believe they are much closer to the truth than anyone else.
There does not exist a group such that consensus of the group is highly correlated with truth? That’s quite an extraordinary claim you’re making; do you have the appropriate evidence?
I think Ilya is not claiming that no such group exists but that it is well nigh impossible to know that your group is one such. At least where the claim is being made very broadly, as it seems to be upthread. I don’t think it’s unreasonable for experimental physicists to think that their consensus on questions of experimental physics is strongly correlated with truth, for instance, and I bet Ilya doesn’t either.
More specifically, I think the following claim is quite plausible: When a group of people coalesces around some set of controversial ideas (be they political, religious, technological, or whatever), the correlation between group consensus and truth in the area of those controversial ideas may be positive or negative or zero, and members of the group are typically ill-equipped to tell which of these cases they’re in.
LW has the best epistemic hygiene of all the communities I’ve encountered and/or participated in.
In so far as epistemic hygiene is positively correlated with truth, I expect LW consensus to be more positively correlated with truth than most (not all) other internet communities.
Doesn’t LW loudly claim to be special in this respect?
And if it actually is not, doesn’t this represent a massive failure of the entire project?
Talking about LW, specifically. Presumably, groups exist that truth-track, for example experts on their area of expertise. LW isn’t an expert group.
The prior on LW is the same as on any other place on the internet, it’s just a place for folks to gab. If LW were extraordinary, truth-wise, they would be sitting on an enormous pile of utility.
I disagree. Epistemic hygiene is genuinely better on LW, and insofar as Epistemic hygiene is positively correlated with truth, I expect LW consensus to be more positively correlated with truth than most (not all) other internet communities.
A group of experts will not necessarily truth-track—there are a lot of counterexamples from gender studies to nutrition.
I would probably say that a group which implements its ideas in practice and is exposed to the consequences is likely to truth-track. That’s not LW, but that’s not most of the academia either.
I don’t think LW is perfect; I think LW has the best epistemic hygiene of all communities I’ve encountered and/or participated in.
I think epistemic hygiene is positively correlated with truth.
I think that’s the core of the disagreement: I assume that if the forum is worth reading in the first place, then the average forum user’s opinion of a comment’s quality tends to correlate with my own. In which case something have lots of upvotes is evidence in favor of me also thinking that it will be a good comment.
This assumption does break down if you assume that the other people have “no clue”, but if that’s your opinion of a forum’s users, then why are you reading that forum in the first place?
“Clue” is not a total ordering of people from best to worst, it varies from topic to topic.
The other issue to consider is what you view the purpose of a forum is.
Consider a subreddit like TheDonald. Presumably they may use karma to get consensus on what a good comment is, also. But TheDonald is an echo chamber. If your opinions are very correlated with opinions of others in a forum, then naturally you get a number that tells you what everyone agrees is good.
That can be useful, sometimes. But this isn’t quality, it’s just community consensus, and that can be arbitrarily far off. “Less wrong,” as-is-written-on-the-tin is supposedly about something more objective than just coming to a community consensus. You need true signal for that, and karma, being a mirror a community holds to itself, cannot give it to you.
edit: the form of your question is: “if you don’t like TheDonald, why are you reading TheDonald?” Is that what you want to be saying?
Hopefully this question is not too much of a digression—but has anyone considered using something like Arxiv-Sanity but instead of for papers it could include content (blog posts, articles, etc.) produced by the wider rationality community? Because at least with that you are measuring similarity to things you have already read and liked, things other people have read and liked, or things people are linking to and commenting on, and you can search things pretty well based on content and authorship. Ranking things by (what people have stored in their library and are planning on taking time to study) might contain more information than karma.
Karma serves as an indicator of the reception that certain content got. High karma means several people liked it. Negative karma means it was very disliked, etc.
Keep tweaking the rules until you’ve got a system where the easiest way to get karma is to make quality contributions?
There probably exist karma systems which are provably non-gameable in relevant ways. For example, if upvotes are a conserved quantity (i.e. by upvoting you, I give you 1 upvote and lose 1 of my own upvotes), then you can’t manufacture them from thin air using sockpuppets.
However, it also seems like for a small community, you’re probably better off just moderating by hand. The point of a karma system is to automatically scale moderation up to a much larger number of people, at which point it makes more sense to hash out details. In other worse, maybe I should go try to get a job on reddit’s moderator tools team.
This will never ever work. Predicting this in advance.
You should tell Google and academia, they will be most interested in your ideas. Don’t you think people already thought very hard about this? This is such a typical LW attitude.
Can you show me 3 peer-reviewed papers which discuss discussion site karma systems that differ meaningfully from reddit’s, and 3 discussion sites that implement karma systems that differ from reddit’s in interesting ways? If not, it seems like a neglected topic to me.
Maybe I’m just not very good at doing literature searches. I did a search on Google Scholar for “reddit karma” and found only one paper which focuses on reddit karma. It’s got brilliant insights such as
...
I believe Robin Hanson when he says academics neglect topics if they are too weird-seeming. Do you disagree?
It’s certainly plausible that there is academic research relevant to the design of karma systems, but I don’t see why the existence of such research is a compelling reason to not spend 5 minutes thinking about the question from first principles on my own. Relevant quote.
Coincidentally, just a couple days ago I was having a conversation with a math professor here at UC Berkeley about the feasibility of doing research outside of academia. The professor’s opinion was that this is very difficult to do in math, because math is a very “vertical” field where you have to climb to the top before making a contribution, and as long as you are going to spend half a decade or more climbing to the top, you might as well do so within the structure of academia. However, the professor did not think this was true of computer science (see: stuff like Bitcoin which did not come out of academia).
You can’t do lit searches with google. Here’s one paper with a bunch of references on attacks on reputation systems, and reputation systems more generally:
https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36757.pdf
You are right that lots of folks outside of academia do research on this, in particular game companies (due to toxic players in multiplayer games). This is far from a solved problem—Valve, Riot and Blizzard spend an enormous amount of effort on reputation systems.
I don’t think there is a way to write this in a way that doesn’t sound mean: because you are an amateur. Imo, the best way for amateurs to proceed is to (a) trust experts, (b) read expert stuff, and (c) mostly not talk. Changes are, your 5 minute thoughts on the matter are only adding noise to the discussion. In principle, taking expert consensus as the prior is a part of rationality. In practice, people ignore this part because it is not a practice that is fun to follow. It’s much more fun to talk than to read papers.
LW’s love affair with amateurism is one of the things I hate most about its culture.
My favorite episode in the history of science is how science “forgot” what the cure of scurvy was. In order for human civilization not to forget things, we need to be better about (a), (b), (c) above.
I appreciate the literature pointer.
What expert consensus are you referring to? I see an unsolved engineering problem, not an expert consensus.
My view of amateurism has been formed, in a large part, from reading experts on the topic:
Paul Graham
Richard Hamming
Edward Boyden
This past summer I was working at a startup that does predictive maintenance for internet-connected devices. The CEO has a PhD from Oxford and did his postdoc at Stanford, so probably not an amateur. But working over the summer, I was able to provide a different perspective on the problems that the company had been thinking about for over a year, and a big part of the company’s proposed software stack ended up getting re-envisioned and written from scratch, largely due to my input. So I don’t think it’s ridiculous for me to wonder whether I’d be able to make a similar contribution at Valve/Riot/Blizzard.
The main reason I was able to contribute as much as I did was because I had the gumption to consider the possibility that the company’s existing plans weren’t very good. Basically by going in the exact opposite direction of your “amateurs should stay humble” advice.
Here are some more things I believe:
If you’re solving a problem that is similar to a problem that has already been solved, but is not an exact match, sometimes it takes as much effort to re-work an existing solution as to create a new solution from scratch.
Noise is a matter of place. A comment that is brilliant by the standards of Yahoo Answers might justifiably be downvoted on Less Wrong. It doesn’t make sense to ask that people writing comments on LW try to reach the standard of published academic work.
In computer science, industry is often “ahead” of academia in the sense that important algorithms get discovered in industry first, then academics discover them later and publish their results.
Interested to learn more about your perspective.
(a) They also laughed at Bozo the Clown. (I think this is Carl Sagan’s quote).
(b) Outside view: how often do outsiders solve a problem in a novel way, vs just adding noise and cluelessness to the discussion? Base rates! Again, nothing that I am saying is controversial, having good priors is a part of “rationality folklore” already. Going with expert consensus as a prior is a part of “rationality folklore” already. It’s just that people selectively follow rationality practices only when they are fun to follow.
(c) “In computer science, industry is often “ahead” of academia in the sense that important algorithms get discovered in industry first”
Yes, this sometimes happens. But again, base rates. Google/Facebook is full of academia-trained PhDs and ex-professors, so the line here is unclear. It’s not amateurs coming up with these algorithms. John Tukey came up with the Fast Fourier Transform while at Bell Labs, but he was John Tukey, and had a math PhD from Princeton.
(Upvoted).
This is where we differ; I think the potential for substantial contribution vastly outweighs any “noise” that may be be caused by amateurs taking stabs at he problem. I do not think all the low hanging fruit are gone (and if they were, how would we know so?), I think that amateurs are capable of substantial contributions in several fields. I think that optimism towards open problems is a more productive attitude.
I support “LW’s love affair with amateurism”, and it’s a part of the culture I wouldn’t want to see disappear.
This reply contributes nothing to the discussion of the problem at hand, and is quite uncharitable, I hope such replies were discouraged, and if downvoting was enabled, I would have downvoted it.
If thinking that they can solve the problem at hand (and making attempts at it) is a “typical LW attitude”, then it is an attitude I want to see more of and believe should be encouraged (thus, I’ll be upvoting /u/John_Maxwell_IV ’s post). A priori assuming that one cannot solve a problem (that hasn’t been proven/isnt known to be unsolvable) and thus refraining from even attempting the problem, isn’t an attitude that I want to see become the norm in Lesswrong. It’s not an attitude that I think is useful, productive, optimal or efficient.
It is my opinion, that we want to encourage people to attempt problems of interest to the community (the potential benefits are vast (e.g the problem is solved, and/or significant improvements are made on the problem, and future endeavours would have a better starting point), and the potential demerits are of lesser impact (time (ours and whoever attempts it) is wasted on an unpromising solution).
Coming back to the topic that was being discussed, I think methods of costly signalling are promising (for example, when you upvote a post you transfer X karma to the user, and you lose k*X (k < 1)).
I have been here for a few years, I think my model of “the LW mindset” is fairly good.
I suppose the general thing I am trying to say is: “speak less, read more.” But at the end of the day, this sort of advice is hopelessly entangled with status considerations. So it’s hard to give to a stranger, and have it be received well. Only really works in the context of an existing apprenticeship relationship.
Status games outside, the sentiment expressed in my reply are my real views on the matter.
(“A priori” suggests lack of knowledge to temper an initial impression, which doesn’t apply here.)
There are problems one can’t by default solve, and a statement, standing on its own, that it’s feasible to solve them is known to be wrong. A “useful attitude” of believing something wrong is a popular stance, but is it good? How does its usefulness work, specifically, if it does, and can we get the benefits without the ugliness?
An optimistic attitude towards problems that are potentially solvable is instrumentally useful—and dare I argue—instrumentally rational. The drawbacks of encouraging an optimistic attitude towards open problems are far outweighed by the potential benefits.
(The quote markup in your comment designates a quote from your earlier comment, not my comment.)
You are not engaging the distinction I’ve drawn. Saying “It’s useful” isn’t the final analysis, there are potential improvements that avoid the horror of intentionally holding and professing false beliefs (to the point of disapproving of other people pointing out their falsehood; this happened in your reply to Ilya).
The problem of improving over the stance of an “optimistic attitude” might be solvable.
I know: I was quoting myself.
I guess for me it is.
The beliefs aren’t known to be false. It is not clear to me, that someone believing they can solve a problem (that isn’t known/proven or even strongly suspected to be unsolvable) is a false belief.
What do you propose to replace the optimism I suggest?
Moderation is basically the only way, I think. You could try to use fancy pagerank-anchored-by-trusted-users ratings, or make votes costly to the user in some way, but I think moderation is the necessary fallback.
Goodhart’s law is real, but people still try to use metrics. Quality may speak for itself, but it can be too costly to listen to the quality of every single thing anyone says.
People use name recognition in practice, works pretty well.
I can use name recognition to scroll through a comment thread to find all the comments by the people that I consider in high regard, but this is much more effort than just having a karma system which automatically shows the top-voted comments first. (The karma system also doesn’t discriminate against new writers as badly as relying on name recognition does.)
Going to reply to this because I don’t think it should be overlooked. It’s a valid point—people tend to want to filter out information that’s not from the sources they trust. I think these kind of incentive pressures are what led to the “LessWrong Diaspora” being concentrated around specific blogs belonging to people with very positive reputation such as Scott Alexander. And when people want to look at different sources of information they will follow the advice of said people usually. This is how I operate when I’m doing my own reading / research—I start somewhere I consider to be the “safest” and move out from there according to the references given at that spot and perhaps a few more steps outward.
When we use a karma / voting system, we are basically trying to calculate P(this contains useful information | this post has a high number of votes) but no voting system ever offers as much evidence as a specific reference from someone we recognize as trustworthy. The only way to increase the evidence gained from a voting system is to add further complexity to the system by increasing the amount of information contained in a vote, either by weighing the votes or by identifying the person behind the vote. And then from there you can add more to a vote, like a specific comment or a more nuanced judgement. I think the end of that track is basically what we have now, blogs by a specific person linking to other blogs, or social media like Facebook where no user is anonymous and everyone has their information filtered in some way.
Essentially I’m saying we should not ignore the role that optimization pressure has played in producing the systems we already have.
Which is why there should be a way to vote on users, not content, the quantity of unevaluated content shouldn’t divide the signal. This would matter if the primary mission succeeds and there is actual conversation worth protecting.
Ranking helps me know what to read.
The SlateStarCodex comments are unusable for me because nothing is sorted by quality, so what’s at the top is just whoever had the fastest fingers and least filter.
Maybe this isn’t a problem for fast readers (I am a slow reader), but I find automatic sorting mechanisms to be super useful.
This. SSC comments I basically only read if there are very few of them, because of the lack of karma; on LW even large discussions are actually readable, thanks to karma sorting.
That’s an illusion of readability though, it’s only sorting in a fairly arbitrary way.
As long as it’s not anti-correlated with quality, it helps.
It doesn’t matter if the top comment isn’t actually the very best comment. So long as the system does better than random, I as a reader benefit.
Over the years I’ve gone through periods of time where I can devote the effort/time to thoroughly reading LW and periods of time where I can basically just skim it.
Because of this I’m in a good position to judge the reliability of karma in surfacing content for its readability.
My judgement is that karma strongly correlates with readability.
Oli and I disagree somewhat on voting systems. I think you get a huge benefit from doing voting at all, a small benefit from doing simple weighted voting (including not allowing people below ~10 karma to vote), and then there’s not much left from complicated vote weighting schemes (like eigenkarma or so on). Part of this is because more complicated systems don’t necessarily have more complicated gaming mechanics.
There are empirical questions involved; we haven’t looked at, for example, the graph of what karma converges to if you use my simplistic vote weighting scheme vs. an eigenkarma scheme, but my expectation is a very high correlation. (I’d be very surprised if it were less than .8, and pretty surprised if it were less than .95.)
I expect the counterfactual questions—”how would Manfred have voted if we were using eigenkarma instead of simple aggregation?”—to not make a huge difference in practice, altho they may make a difference for problem users.
What’s the benefit? Also, what’s the harm? (to you)
Main benefits to karma are feedback for writers (both informative and hedonic) and sorting for attention conservation. Main costs are supporting the underlying tech, transparency / explaining the system, and dealing with efforts to game it.
(For example, if we just clicked a radio button and we had eigenkarma, I would be much more optimistic about it. As is, there are other features I would much rather have.)
Strongly seconded. I think there should be no karma system.
I commented on LW 2.0 itself about another reason why a karma system is bad.
Yeah I agree that people need to weigh experts highly. LW pays lipservice to this, but only that—basically as soon as people have a strong opinion experts get discarded. Started with EY.
My impression of how to do this is to give experts an “as an expert, I...” vote. So you could see that a post has 5 upvotes and a beaker downvote, and say “hmm, the scientist thinks this is bad and other people think it’s good.”
Multiple flavors lets you separate out different parts of the comment in a way that’s meaningfully distinct from the Slashdot-style “everyone can pick a descriptor;” you don’t want everyone to be able to say “that’s funny,” just the comedians.
This works somewhat better than simple vote weighting because it lets people say whether they’re doing this as just another reader or ‘in their professional capacity;’ I want Ilya’s votes on stats comments to be very highly weighted and I want his votes on, say, rationality quotes to be weighted roughly like anyone else’s.
Of course, this sketch has many problems of its own. As written, I lumped many different forms of expertise into “scientist,” and you’re trusting the user to vote in the right contexts.
If you have a more-legible quality signal (in the James C. Scott sense of “legibility”), and a less-legible quality signal, you will inevitably end up using the more-legible quality signal more, and the less-legible one will be ignored—even if the less-legible one is tremendously more accurate and valuable.
Your suggestion is not implausible on its face, but the devil is in the details. No doubt you know this, as you say “this sketch has many problems of its own”. But these details and problems conspire to make such a formalized version of the “expert’s vote” either substantially decoupled from what it’s supposed to represent, or not nearly as legible as the simple “people’s vote”. In the former case, what’s the point? In the latter case, the result is that the “people’s vote” will remain much more influential on visibility, ranking, inclusion in canon, contribution to a member’s influence in various ways, and everything else you might care to use such formalized rating numbers for.
The question of reputation, and of whose opinion to trust and value, is a deep and fundamental one. I don’t say it’s impossible to algorithmize, but if possible, it is surely quite difficult. And simple karma (based on unweighted votes) is, I think, a step in the wrong direction.
As far as an algorithm for reputation goes, academia seems to have something that sort of scales in the form of citations and co-authors:
http://www.overcomingbias.com/2017/08/the-problem-with-prestige.html
It’s certainly a difficult problem however.
Vaniver, I sympathize with the desire to automate figuring out who experts are via point systems, but consider that even in academia (with a built-in citation pagerank), people still rely on names. That’s evidence about pagerank systems not being great on their own. People game the hell out of citations.
Probably should weigh my opinion of rationality stuff quite low, I am neither a practitioner nor a historian of rationality. I have gotten gradually more pessimistic about the whole project.
To be clear, in this scheme whether or not someone had access to the expert votes would be set by hand.
What is going to be the definition of “an expert” in LW 2.0?
From context, it’s clearly (conditional on the feature being there at all) “someone accepted by the administrators of the site as an expert”. How they make that determination would be up to them; I would hope that (again, conditional on the thing happening at all) they would err on the side of caution and accept people as experts only in cases where few reasonable people would disagree.
“All animals are equal… ” X-)
The issue is credibility.
Is there anyone whose makes it their business to guard against this?
Academics make it their business, and they rely on name recognition and social networks.