The shortage of reviews is both puzzling and concerning, but one explanation for it is that the expected financial return of writing reviews for the prize money is not high enough to motivate the average LessWrong user, and the expected social prestige for commenting on old things is lower per unit of effort than writing new things. (It’s certainly true for me, I find commenting way easier than posting but I’ve never got any social recognition from it, whereas my single LW post introduced me to about 50 people.)
Another potential reason is that it’s pretty hard to “review” the submissions. Like most essays on LessWrong, they state one or two big ideas and then spend the vast majority of the words on explaining the ideas and connecting them to other things we know. This insight density is what makes them interesting, but it also makes it very hard to evaluate the theories within them. If you can’t examine the evidence that’s behind a theory, you have to either assume it or challenge the theory as a whole, which is what usually happens in the comments section after it’s first published. If true, this means that you’re not really asking for reviews, but lengthy comments that can say something that wouldn’t have been said last year.
As of the time of writing this comment, there’ve been 82 reviews on the 75 qualified (i.e., twice-nominated) posts by 32 different reviewers. 24 reviews were by 18 different authors on their own posts.
Whether this counts as a shortage, is puzzling, or is concerning is a harder question to answer.
My quick thoughts:
Personally, I was significantly surprised by the level of contribution to the 2018 Review. It’s really hard to get people to do things (especially thing that are New and Work) and I wouldn’t have been puzzled at all if the actual numbers had been 20% of what they actually are. Even the more optimistic LW team members had planned for a world where the team hunkered down and wrote all the reviews ourselves.
If we consider the relevant population of of potential reviewers to be the same as those eligible to vote, i.e., users with 1000+ karma, then there are ~130 [1] such users who view at least one post on the site each week (~150 at the monthly timescale). That gives us 20-25% of active eligible voters writing reviews.
If you look at all users above 100 karma, the number is 8-10% of candidate reviewer engaging in the Review. People below 100 karma won’t have written many comments and/or probably haven’t been around for that long so aren’t likely candidates.
Relative to the people who could reasonably be expected to review, I think we’re doing decently, if something like 10-20% of people who could do something are doing it. Of course, there’s another question of why there aren’t more people with 100+ or 1000+ karma around to begin with, but it’s probably not to do with the incentives or mechanics of the review.
[1] For reference, there are 430 users in the LessWrong database with more than 1000 karma.
Those numbers look pretty good in percentage terms. I hadn’t thought about it from that angle and I’m surprised they’re that high.
FWIW, my original perception that there was a shortage was based on the ratio between the quantity of reviews and the quantity of new posts that have been written since the start of the review period. In theory, the latter takes a lot more effort than the former, so it would be unexpected if more people do the higher effort thing automatically and less people do the lower effort thing despite explicit calls to action and $2000 in prize money.
Re: the ratio The ratio isn’t obviously bad to me, depending on your expectation? Between the beginning of the review on Dec 8th and Jan 3rd [1] then there’s been 199 posts (excluding question posts but not excluding link posts), but of those:
− 149 post written by 66 users with over 100 karma
- 95 written by 33 users above 1000 karma (the most relevant comparison)
- 151 posts written by 75 people whose account was first active before 2019.
Compare those with the 82 reviews by 32 reviewers, it’s a ratio of reviews:posts between 1:1 and 1:2. I’m curious if you’d been expecting something much different. [ETA: because of the incomplete data you might want to say 120 posts vs 82 reviews which is 1:1.5.]
Re: the effort It’s not clear to me that the effort involved means you should expect more reviews: 1) I think the Cost-Benefit Ratio for posts is higher even if they take longer, 2) reviewing a post only happens if you’ve read the post and it impacted you enough to remember and feel motivated to say stuff about, 3) when I write posts, it’s about something I’ve been thinking about and am excited about; I haven’t developed any habit around being excited about reviews since I’m not used to it.
[1] That’s when I last pulled that particular data onto my machine and I’m being a bit lazy because 8 more days it isn’t going to change the overall picture; though it means the relative numbers are a bit worse for reviews.
Okay, so 80% of the reviewers have > 1000 karma. 90% >= 463; which means I think the “20-25% of eligible review voters are writing reviews” number is correct if this methodology actually makes sense.
I also buy the econ story here (and, per Ruby, I’m somewhat pleasantly surprised by the amount of reviewing activity given this).
General observation suggests that people won’t find writing reviews that intrinsically motivating (compare to just writing posts, which all the authors are doing ‘for free’ with scant chance of reward, also compare to academia—I don’t think many academics find peer review/refereeing one of the highlights of their job). With apologies for the classic classical econ joke, if reviewing was so valuable, how come people weren’t doing it already? [It also looks like ~25%? of reviews, especially the most extensive, are done by the author on their own work].
If we assume there’s little intrinsic motivation (I’m comfortably in the ‘you’d have to pay me’ camp), the money doesn’t offer that much incentive. Given Rudy’s numbers suppose each of the 82 reviews takes an average of 45 minutes or so (factoring in (re)reading time and similar). If the nomination money is ~roughly allocated by person time spent, the marginal expected return of me taking an hour to review is something like $40. Facially, this isn’t too bad an hourly rate, but the real value is significantly lower:
The ‘person-time lottery’ model should not be denominated by observed person-time so far, but one’s expectation how much will be spent in total once reviewing finishes, which will be higher (especially conditioned on posts like this).
It’s very unlikely the reward is going to allocated proportionately to time spent (/some crude proxy thereof like word count). Thus the EV would be discounted by whatever degree of risk aversion one has (I expect the modal ‘payout’ for a review to be $0).
Opaque allocation also incurs further EV-reducing uncertainty, but best guesses suggest there will be Pareto-principle/tournament dynamic game dynamics, so those with (e.g.) reasons to believe they’re less likely to impress the mod team’s evaluation of their ‘pruning’ have strong reasons to select themselves out.
I definitely don’t expect the money to be directly rewarding in a standard monetary sense. (In general I think prizes do a bad job of providing expected monetary value). My hope for the prize was more to be a strong signal of the magnitude of how much this mattered, and how much recognition reviews would get.
It’s entirely plausible that reviewing is sufficiently “not sufficiently motivating” that actually, the thing to do is pay people directly for it. It’s also possible that the prizes should be lopsided in favor of reviews. (This year the whole process was a bit of an experiment so we didn’t want to spend too much money on it, but it might be that just adding more funding to subsidize things is the answer)
But I had some reason to think “actually things are mostly fine, it’s just that the Review was a new thing and not well understood, and communicating more clearly about it might help.”
My current sense is:
There have been some critical reviews, so there is at least some motivation latent motivation to do so.
There are people on the site who seem to be generally interested in giving critical feedback, and I was kinda hoping that they’d be up for doing so as part of a broader project. (Some of them have but not as many as I’d hoped. To be fair, I think the job being asked for the 2018 Review is harder than what they normally do)
One source of motivation I’d expected to tap into (which I do think has happened a bit) is “geez, that might be going into the official Community Recognized Good Posts Book? Okay, before it wasn’t worth worrying about Someone Being Wrong On the Internet, but now the stakes are raised and it is worth it.”
Agree with these reasons this is hard. A few thoughts (this is all assuming you’re the sort of person who basically thinks the Review makes sense as a concept and want to participate, obviously this may not apply to Mark)
Re: Prestige: I don’t know if this helps, but to be clear, I expect to include good reviews in the Best of 2018 book itself. I’m personally hoping that each post comes with at least one review, and in the event that there are deeply substantive reviews those may be given top-billing equivalent. I’m not 100% sure what will happen with reviews in the online seqeunce.
(In fact, I expect reviews to be an potentially easier way to end up in the book than by writing posts, since the target area is more clearly specified.)
“It’s Hard to Review Posts”
This is definitely true. Often what needs reviewing is less like “author made an unsubstantiated claim or logical error” and more like “is the entire worldview that generated the post, and the connections the post made to the rest of the world, reasonable? Does it contain subtle flaws? Are there better frames for carving up the world than the one in the post?”
This is a hard problem, and doing a good job is honestly harder than one month work of work. But, this seems like a quite important problem for LessWrong to be able to solve. I think a lot of this site’s value comes from people crystallizing ideas that shift one’s frame, in domains where evidence is hard to come by. “How to evaluate that?” feels like an essential question for us to figure out how to answer.
My best guess for now is for reviews to not try to fully answer “does this post check out?” (in cases where that depends on a lot of empirical questions that are hard to check, or where “is this the right ontology?” are hard to check). But, instead, to try to map out “what are the questions I would want answered, that would help me figure out if this post checked out?”
Often what needs reviewing is less like “author made an unsubstantiated claim or logical error” and more like “is the entire worldview that generated the post, and the connections the post made to the rest of the world, reasonable?
I agree with this, but given that these posts were popular because lots of people thought they were true and important, deeming the entire worldview of the author flawed would also imply the worldview of the community was flawed as well. It’s certainly possible that the community’s entire worldview is flawed, but even if you believe that to be true, it would be very difficult to explain in a way that people would find believable.
FWIW from a karma perspective I’ve found writing reviews to be significantly more profitable than most comments. IDK how this translates into social prestige though.
This is my understanding of how karma maps to social prestige:
People with existing social prestige will be given more karma for a post or a comment than if it was written by someone unknown to the community.
Posts with more karma tend to be more interesting, which helps boost the author’s prestige because more people will click on a post with higher karma.
Comments with high karma are viewed as more important.
Comments with higher karma than other comments in the same thread are viewed as the correct opinion.
Virtually nobody looks at how much karma you’ve got to figure out how seriously to take your opinions. This is probably because by the time you have accumulated enough for it to mean something, regulars will already associate your username with good content.
I’d be curious to learn the alternative ways you favor, or more detail on why this approach is flawed. Standard academic peer review has its issues, but seemingly a community should have a way it reviews material and determines what’s great, what needs work, and what is plain wrong.
Well part of rationality is being able to assess and integrate this information yourself, rather than trusting in the accuracy of curators (which reinforces bad habits IMHO, hence the concern). Things that are useful get referenced, build citations, and are therefore more visible and likely to be found.
Sorry if I wasn’t clear: I don’t think it’s a useful thing to do, full stop.
I don’t mean to rain on anyone’s parade. I was really just replying to the top-level comment which started with:
The shortage of reviews is both puzzling and concerning...
I was just pointing out that some people aren’t participating because they don’t find the project worth doing in the first place. To me it’s just noise. I’m not going to get in the way of anyone else, if they want to contribute, but if you’re wondering why there is a shortage of reviews.. well I gave my reasons for not contributing.
That makes sense. As I’m won’t to say, there often risks/benefits/costs in each direction.
Ways in which I think communal and collaborative review are imperative:
Public reviews help establish the standards or reasoning expected in the community.
By reading other people’s evaluations, you can better learn how to perform your own.
It’s completely time prohibitive for me to thoroughly review every post that I might reference, instead I trust in the author. Dangerously, many people might do this and a post becomes highly cited despite flaws that would be exposed if a person or two spent several hours evaluating it*
I might be competent to understand and reference a paper, but lack the domain expertise to review it myself. The review of another domain expert can help me understanding the shortcoming’s of a post.
And as I think has been posted about, having a coordinated “review festival” is ideally an opportunity for people with different opinions about controversial topics to get together and hash it out. In an ideal world, review is the time when the community gets together to resolve what debates it can.
*An example is the work I began auditing the paper Eternity in Six Hours which is tied to the Astronomical Waste argument. Many people reference that argument, but as far as I know, few people have spent much time attempting to systematically evaluate its claims. (I do hope to finish that work and publish more on it sometime.)
Maybe people read newer posts instead of (re-)reading older posts
the time of the year (in which reviews occurred)
the length of time open for reviews
the set of users reviews are open to
the set of posts open to review. For example, these are from long ago. (Perhaps if there was a 1 year retrospective, and 2 year, and so on up to 5 years, that could capture engagement earlier, and get ideas for short term and longer term effects.)
some trivial inconveniences around reading the posts to be reviewed (probably already addressed, but did that affect things a lot?)
The shortage of reviews is both puzzling and concerning, but one explanation for it is that the expected financial return of writing reviews for the prize money is not high enough to motivate the average LessWrong user, and the expected social prestige for commenting on old things is lower per unit of effort than writing new things. (It’s certainly true for me, I find commenting way easier than posting but I’ve never got any social recognition from it, whereas my single LW post introduced me to about 50 people.)
Another potential reason is that it’s pretty hard to “review” the submissions. Like most essays on LessWrong, they state one or two big ideas and then spend the vast majority of the words on explaining the ideas and connecting them to other things we know. This insight density is what makes them interesting, but it also makes it very hard to evaluate the theories within them. If you can’t examine the evidence that’s behind a theory, you have to either assume it or challenge the theory as a whole, which is what usually happens in the comments section after it’s first published. If true, this means that you’re not really asking for reviews, but lengthy comments that can say something that wouldn’t have been said last year.
Raw numbers to go with Bendini’s comment:
As of the time of writing this comment, there’ve been 82 reviews on the 75 qualified (i.e., twice-nominated) posts by 32 different reviewers. 24 reviews were by 18 different authors on their own posts.
Whether this counts as a shortage, is puzzling, or is concerning is a harder question to answer.
My quick thoughts:
Personally, I was significantly surprised by the level of contribution to the 2018 Review. It’s really hard to get people to do things (especially thing that are New and Work) and I wouldn’t have been puzzled at all if the actual numbers had been 20% of what they actually are. Even the more optimistic LW team members had planned for a world where the team hunkered down and wrote all the reviews ourselves.
If we consider the relevant population of of potential reviewers to be the same as those eligible to vote, i.e., users with 1000+ karma, then there are ~130 [1] such users who view at least one post on the site each week (~150 at the monthly timescale). That gives us 20-25% of active eligible voters writing reviews.
If you look at all users above 100 karma, the number is 8-10% of candidate reviewer engaging in the Review. People below 100 karma won’t have written many comments and/or probably haven’t been around for that long so aren’t likely candidates.
Relative to the people who could reasonably be expected to review, I think we’re doing decently, if something like 10-20% of people who could do something are doing it. Of course, there’s another question of why there aren’t more people with 100+ or 1000+ karma around to begin with, but it’s probably not to do with the incentives or mechanics of the review.
[1] For reference, there are 430 users in the LessWrong database with more than 1000 karma.
Those numbers look pretty good in percentage terms. I hadn’t thought about it from that angle and I’m surprised they’re that high.
FWIW, my original perception that there was a shortage was based on the ratio between the quantity of reviews and the quantity of new posts that have been written since the start of the review period. In theory, the latter takes a lot more effort than the former, so it would be unexpected if more people do the higher effort thing automatically and less people do the lower effort thing despite explicit calls to action and $2000 in prize money.
Re: the ratio
The ratio isn’t obviously bad to me, depending on your expectation? Between the beginning of the review on Dec 8th and Jan 3rd [1] then there’s been 199 posts (excluding question posts but not excluding link posts), but of those:
− 149 post written by 66 users with over 100 karma
- 95 written by 33 users above 1000 karma (the most relevant comparison)
- 151 posts written by 75 people whose account was first active before 2019.
Compare those with the 82 reviews by 32 reviewers, it’s a ratio of reviews:posts between 1:1 and 1:2.
I’m curious if you’d been expecting something much different. [ETA: because of the incomplete data you might want to say 120 posts vs 82 reviews which is 1:1.5.]
Re: the effort
It’s not clear to me that the effort involved means you should expect more reviews: 1) I think the Cost-Benefit Ratio for posts is higher even if they take longer, 2) reviewing a post only happens if you’ve read the post and it impacted you enough to remember and feel motivated to say stuff about, 3) when I write posts, it’s about something I’ve been thinking about and am excited about; I haven’t developed any habit around being excited about reviews since I’m not used to it.
[1] That’s when I last pulled that particular data onto my machine and I’m being a bit lazy because 8 more days it isn’t going to change the overall picture; though it means the relative numbers are a bit worse for reviews.
Okay, so 80% of the reviewers have > 1000 karma. 90% >= 463; which means I think the “20-25% of eligible review voters are writing reviews” number is correct if this methodology actually makes sense.
I also buy the econ story here (and, per Ruby, I’m somewhat pleasantly surprised by the amount of reviewing activity given this).
General observation suggests that people won’t find writing reviews that intrinsically motivating (compare to just writing posts, which all the authors are doing ‘for free’ with scant chance of reward, also compare to academia—I don’t think many academics find peer review/refereeing one of the highlights of their job). With apologies for the classic classical econ joke, if reviewing was so valuable, how come people weren’t doing it already? [It also looks like ~25%? of reviews, especially the most extensive, are done by the author on their own work].
If we assume there’s little intrinsic motivation (I’m comfortably in the ‘you’d have to pay me’ camp), the money doesn’t offer that much incentive. Given Rudy’s numbers suppose each of the 82 reviews takes an average of 45 minutes or so (factoring in (re)reading time and similar). If the nomination money is ~roughly allocated by person time spent, the marginal expected return of me taking an hour to review is something like $40. Facially, this isn’t too bad an hourly rate, but the real value is significantly lower:
The ‘person-time lottery’ model should not be denominated by observed person-time so far, but one’s expectation how much will be spent in total once reviewing finishes, which will be higher (especially conditioned on posts like this).
It’s very unlikely the reward is going to allocated proportionately to time spent (/some crude proxy thereof like word count). Thus the EV would be discounted by whatever degree of risk aversion one has (I expect the modal ‘payout’ for a review to be $0).
Opaque allocation also incurs further EV-reducing uncertainty, but best guesses suggest there will be Pareto-principle/tournament dynamic game dynamics, so those with (e.g.) reasons to believe they’re less likely to impress the mod team’s evaluation of their ‘pruning’ have strong reasons to select themselves out.
Helpful thoughts, thanks!
I definitely don’t expect the money to be directly rewarding in a standard monetary sense. (In general I think prizes do a bad job of providing expected monetary value). My hope for the prize was more to be a strong signal of the magnitude of how much this mattered, and how much recognition reviews would get.
It’s entirely plausible that reviewing is sufficiently “not sufficiently motivating” that actually, the thing to do is pay people directly for it. It’s also possible that the prizes should be lopsided in favor of reviews. (This year the whole process was a bit of an experiment so we didn’t want to spend too much money on it, but it might be that just adding more funding to subsidize things is the answer)
But I had some reason to think “actually things are mostly fine, it’s just that the Review was a new thing and not well understood, and communicating more clearly about it might help.”
My current sense is:
There have been some critical reviews, so there is at least some motivation latent motivation to do so.
There are people on the site who seem to be generally interested in giving critical feedback, and I was kinda hoping that they’d be up for doing so as part of a broader project. (Some of them have but not as many as I’d hoped. To be fair, I think the job being asked for the 2018 Review is harder than what they normally do)
One source of motivation I’d expected to tap into (which I do think has happened a bit) is “geez, that might be going into the official Community Recognized Good Posts Book? Okay, before it wasn’t worth worrying about Someone Being Wrong On the Internet, but now the stakes are raised and it is worth it.”
Agree with these reasons this is hard. A few thoughts (this is all assuming you’re the sort of person who basically thinks the Review makes sense as a concept and want to participate, obviously this may not apply to Mark)
Re: Prestige: I don’t know if this helps, but to be clear, I expect to include good reviews in the Best of 2018 book itself. I’m personally hoping that each post comes with at least one review, and in the event that there are deeply substantive reviews those may be given top-billing equivalent. I’m not 100% sure what will happen with reviews in the online seqeunce.
(In fact, I expect reviews to be an potentially easier way to end up in the book than by writing posts, since the target area is more clearly specified.)
“It’s Hard to Review Posts”
This is definitely true. Often what needs reviewing is less like “author made an unsubstantiated claim or logical error” and more like “is the entire worldview that generated the post, and the connections the post made to the rest of the world, reasonable? Does it contain subtle flaws? Are there better frames for carving up the world than the one in the post?”
This is a hard problem, and doing a good job is honestly harder than one month work of work. But, this seems like a quite important problem for LessWrong to be able to solve. I think a lot of this site’s value comes from people crystallizing ideas that shift one’s frame, in domains where evidence is hard to come by. “How to evaluate that?” feels like an essential question for us to figure out how to answer.
My best guess for now is for reviews to not try to fully answer “does this post check out?” (in cases where that depends on a lot of empirical questions that are hard to check, or where “is this the right ontology?” are hard to check). But, instead, to try to map out “what are the questions I would want answered, that would help me figure out if this post checked out?”
(Example of this includes Eli Tyre’s “Has there been a memetic collapse?” question, relating to Eliezer’s claims in Local Validity)
I agree with this, but given that these posts were popular because lots of people thought they were true and important, deeming the entire worldview of the author flawed would also imply the worldview of the community was flawed as well. It’s certainly possible that the community’s entire worldview is flawed, but even if you believe that to be true, it would be very difficult to explain in a way that people would find believable.
[edit: I re-read your comment and mostly retract mine, but am thinking about a new version of it]
Have you got authorization from authors/copyright holders to do a book compendium?
Everyone will get contacted about inclusion in the book with the opportunity to opt out.
FWIW from a karma perspective I’ve found writing reviews to be significantly more profitable than most comments. IDK how this translates into social prestige though.
I’m not surprised to learn that is the case.
This is my understanding of how karma maps to social prestige:
People with existing social prestige will be given more karma for a post or a comment than if it was written by someone unknown to the community.
Posts with more karma tend to be more interesting, which helps boost the author’s prestige because more people will click on a post with higher karma.
Comments with high karma are viewed as more important.
Comments with higher karma than other comments in the same thread are viewed as the correct opinion.
Virtually nobody looks at how much karma you’ve got to figure out how seriously to take your opinions. This is probably because by the time you have accumulated enough for it to mean something, regulars will already associate your username with good content.
Not all of us agree with the project. I disagree with the entire concept of “pruning” output in this way. I wouldn’t participate on principle.
I’d be curious to learn the alternative ways you favor, or more detail on why this approach is flawed. Standard academic peer review has its issues, but seemingly a community should have a way it reviews material and determines what’s great, what needs work, and what is plain wrong.
Well part of rationality is being able to assess and integrate this information yourself, rather than trusting in the accuracy of curators (which reinforces bad habits IMHO, hence the concern). Things that are useful get referenced, build citations, and are therefore more visible and likely to be found.
Do you think there are any ways the 2018 Review as we’ve been doing it could be modified to be better along the dimensions you’re concerned about?
Sorry if I wasn’t clear: I don’t think it’s a useful thing to do, full stop.
I don’t mean to rain on anyone’s parade. I was really just replying to the top-level comment which started with:
I was just pointing out that some people aren’t participating because they don’t find the project worth doing in the first place. To me it’s just noise. I’m not going to get in the way of anyone else, if they want to contribute, but if you’re wondering why there is a shortage of reviews.. well I gave my reasons for not contributing.
Yeah, true, that seems like a fair reason to point out for why there wouldn’t be more reviews. Thanks for sharing your personal reasons.
That makes sense. As I’m won’t to say, there often risks/benefits/costs in each direction.
Ways in which I think communal and collaborative review are imperative:
Public reviews help establish the standards or reasoning expected in the community.
By reading other people’s evaluations, you can better learn how to perform your own.
It’s completely time prohibitive for me to thoroughly review every post that I might reference, instead I trust in the author. Dangerously, many people might do this and a post becomes highly cited despite flaws that would be exposed if a person or two spent several hours evaluating it*
I might be competent to understand and reference a paper, but lack the domain expertise to review it myself. The review of another domain expert can help me understanding the shortcoming’s of a post.
And as I think has been posted about, having a coordinated “review festival” is ideally an opportunity for people with different opinions about controversial topics to get together and hash it out. In an ideal world, review is the time when the community gets together to resolve what debates it can.
*An example is the work I began auditing the paper Eternity in Six Hours which is tied to the Astronomical Waste argument. Many people reference that argument, but as far as I know, few people have spent much time attempting to systematically evaluate its claims. (I do hope to finish that work and publish more on it sometime.)
So posts should be (pre-)processed for theory/experimentation? (Or distilled?)
Other possible factors:
Maybe people read newer posts instead of (re-)reading older posts
the time of the year (in which reviews occurred)
the length of time open for reviews
the set of users reviews are open to
the set of posts open to review. For example, these are from long ago. (Perhaps if there was a 1 year retrospective, and 2 year, and so on up to 5 years, that could capture engagement earlier, and get ideas for short term and longer term effects.)
some trivial inconveniences around reading the posts to be reviewed (probably already addressed, but did that affect things a lot?)
Note that all users can do review. (It’s only voting, and nomination, that’s restricted to highish karma)