Bendini comments on Please Critique Things for the Review!

Bendini 11 Jan 2020 23:16 UTC
4 points
The shortage of reviews is both puzzling and concerning, but one explanation for it is that the expected financial return of writing reviews for the prize money is not high enough to motivate the average LessWrong user, and the expected social prestige for commenting on old things is lower per unit of effort than writing new things. (It’s certainly true for me, I find commenting way easier than posting but I’ve never got any social recognition from it, whereas my single LW post introduced me to about 50 people.)
Another potential reason is that it’s pretty hard to “review” the submissions. Like most essays on LessWrong, they state one or two big ideas and then spend the vast majority of the words on explaining the ideas and connecting them to other things we know. This insight density is what makes them interesting, but it also makes it very hard to evaluate the theories within them. If you can’t examine the evidence that’s behind a theory, you have to either assume it or challenge the theory as a whole, which is what usually happens in the comments section after it’s first published. If true, this means that you’re not really asking for reviews, but lengthy comments that can say something that wouldn’t have been said last year.
- Ruby 12 Jan 2020 5:25 UTC
  15 points
  Parent
  Raw numbers to go with Bendini’s comment:
  
  As of the time of writing this comment, there’ve been 82 reviews on the 75 qualified (i.e., twice-nominated) posts by 32 different reviewers. 24 reviews were by 18 different authors on their own posts.
  
  Whether this counts as a shortage, is puzzling, or is concerning is a harder question to answer.
  
  My quick thoughts:
  - Personally, I was significantly surprised by the level of contribution to the 2018 Review. It’s really hard to get people to do things (especially thing that are New and Work) and I wouldn’t have been puzzled at all if the actual numbers had been 20% of what they actually are. Even the more optimistic LW team members had planned for a world where the team hunkered down and wrote all the reviews ourselves.
  - If we consider the relevant population of of potential reviewers to be the same as those eligible to vote, i.e., users with 1000+ karma, then there are ~130 [1] such users who view at least one post on the site each week (~150 at the monthly timescale). That gives us 20-25% of active eligible voters writing reviews.
    If you look at all users above 100 karma, the number is 8-10% of candidate reviewer engaging in the Review. People below 100 karma won’t have written many comments and/or probably haven’t been around for that long so aren’t likely candidates.
  Relative to the people who could reasonably be expected to review, I think we’re doing decently, if something like 10-20% of people who could do something are doing it. Of course, there’s another question of why there aren’t more people with 100+ or 1000+ karma around to begin with, but it’s probably not to do with the incentives or mechanics of the review.
  
  [1] For reference, there are 430 users in the LessWrong database with more than 1000 karma.
  What links here?
  - Reviewing the Review by Raemon (26 Feb 2020 2:51 UTC; 45 points)
  - Bendini 12 Jan 2020 7:16 UTC
    3 points
    Parent
    Those numbers look pretty good in percentage terms. I hadn’t thought about it from that angle and I’m surprised they’re that high.
    FWIW, my original perception that there was a shortage was based on the ratio between the quantity of reviews and the quantity of new posts that have been written since the start of the review period. In theory, the latter takes a lot more effort than the former, so it would be unexpected if more people do the higher effort thing automatically and less people do the lower effort thing despite explicit calls to action and $2000 in prize money.
    - Ruby 12 Jan 2020 7:46 UTC
      2 points
      Parent
      Re: the ratio
      The ratio isn’t obviously bad to me, depending on your expectation? Between the beginning of the review on Dec 8th and Jan 3rd [1] then there’s been 199 posts (excluding question posts but not excluding link posts), but of those:
      
      − 149 post written by 66 users with over 100 karma
      - 95 written by 33 users above 1000 karma (the most relevant comparison)
      - 151 posts written by 75 people whose account was first active before 2019.
      
      Compare those with the 82 reviews by 32 reviewers, it’s a ratio of reviews:posts between 1:1 and 1:2.
      I’m curious if you’d been expecting something much different. [ETA: because of the incomplete data you might want to say 120 posts vs 82 reviews which is 1:1.5.]
      
      Re: the effort
      It’s not clear to me that the effort involved means you should expect more reviews: 1) I think the Cost-Benefit Ratio for posts is higher even if they take longer, 2) reviewing a post only happens if you’ve read the post and it impacted you enough to remember and feel motivated to say stuff about, 3) when I write posts, it’s about something I’ve been thinking about and am excited about; I haven’t developed any habit around being excited about reviews since I’m not used to it.
      
      [1] That’s when I last pulled that particular data onto my machine and I’m being a bit lazy because 8 more days it isn’t going to change the overall picture; though it means the relative numbers are a bit worse for reviews.
  - Ruby 12 Jan 2020 7:53 UTC
    2 points
    Parent
    Okay, so 80% of the reviewers have > 1000 karma. 90% >= 463; which means I think the “20-25% of eligible review voters are writing reviews” number is correct if this methodology actually makes sense.
- Thrasymachus 12 Jan 2020 18:45 UTC
  8 points
  Parent
  I also buy the econ story here (and, per Ruby, I’m somewhat pleasantly surprised by the amount of reviewing activity given this).
  General observation suggests that people won’t find writing reviews that intrinsically motivating (compare to just writing posts, which all the authors are doing ‘for free’ with scant chance of reward, also compare to academia—I don’t think many academics find peer review/refereeing one of the highlights of their job). With apologies for the classic classical econ joke, if reviewing was so valuable, how come people weren’t doing it already? [It also looks like ~25%? of reviews, especially the most extensive, are done by the author on their own work].
  If we assume there’s little intrinsic motivation (I’m comfortably in the ‘you’d have to pay me’ camp), the money doesn’t offer that much incentive. Given Rudy’s numbers suppose each of the 82 reviews takes an average of 45 minutes or so (factoring in (re)reading time and similar). If the nomination money is ~roughly allocated by person time spent, the marginal expected return of me taking an hour to review is something like $40. Facially, this isn’t too bad an hourly rate, but the real value is significantly lower:
  - The ‘person-time lottery’ model should not be denominated by observed person-time so far, but one’s expectation how much will be spent in total once reviewing finishes, which will be higher (especially conditioned on posts like this).
  - It’s very unlikely the reward is going to allocated proportionately to time spent (/some crude proxy thereof like word count). Thus the EV would be discounted by whatever degree of risk aversion one has (I expect the modal ‘payout’ for a review to be $0).
  - Opaque allocation also incurs further EV-reducing uncertainty, but best guesses suggest there will be Pareto-principle/tournament dynamic game dynamics, so those with (e.g.) reasons to believe they’re less likely to impress the mod team’s evaluation of their ‘pruning’ have strong reasons to select themselves out.
  What links here?
  - Reviewing the Review by Raemon (26 Feb 2020 2:51 UTC; 45 points)
  - Raemon 12 Jan 2020 20:13 UTC
    4 points
    Parent
    Helpful thoughts, thanks!
    I definitely don’t expect the money to be directly rewarding in a standard monetary sense. (In general I think prizes do a bad job of providing expected monetary value). My hope for the prize was more to be a strong signal of the magnitude of how much this mattered, and how much recognition reviews would get.
    It’s entirely plausible that reviewing is sufficiently “not sufficiently motivating” that actually, the thing to do is pay people directly for it. It’s also possible that the prizes should be lopsided in favor of reviews. (This year the whole process was a bit of an experiment so we didn’t want to spend too much money on it, but it might be that just adding more funding to subsidize things is the answer)
    But I had some reason to think “actually things are mostly fine, it’s just that the Review was a new thing and not well understood, and communicating more clearly about it might help.”
    My current sense is:
    There have been some critical reviews, so there is at least some motivation latent motivation to do so.
    There are people on the site who seem to be generally interested in giving critical feedback, and I was kinda hoping that they’d be up for doing so as part of a broader project. (Some of them have but not as many as I’d hoped. To be fair, I think the job being asked for the 2018 Review is harder than what they normally do)
    One source of motivation I’d expected to tap into (which I do think has happened a bit) is “geez, that might be going into the official Community Recognized Good Posts Book? Okay, before it wasn’t worth worrying about Someone Being Wrong On the Internet, but now the stakes are raised and it is worth it.”
- Raemon 12 Jan 2020 4:58 UTC
  4 points
  Parent
  Agree with these reasons this is hard. A few thoughts (this is all assuming you’re the sort of person who basically thinks the Review makes sense as a concept and want to participate, obviously this may not apply to Mark)
  Re: Prestige: I don’t know if this helps, but to be clear, I expect to include good reviews in the Best of 2018 book itself. I’m personally hoping that each post comes with at least one review, and in the event that there are deeply substantive reviews those may be given top-billing equivalent. I’m not 100% sure what will happen with reviews in the online seqeunce.
  (In fact, I expect reviews to be an potentially easier way to end up in the book than by writing posts, since the target area is more clearly specified.)
  “It’s Hard to Review Posts”
  This is definitely true. Often what needs reviewing is less like “author made an unsubstantiated claim or logical error” and more like “is the entire worldview that generated the post, and the connections the post made to the rest of the world, reasonable? Does it contain subtle flaws? Are there better frames for carving up the world than the one in the post?”
  This is a hard problem, and doing a good job is honestly harder than one month work of work. But, this seems like a quite important problem for LessWrong to be able to solve. I think a lot of this site’s value comes from people crystallizing ideas that shift one’s frame, in domains where evidence is hard to come by. “How to evaluate that?” feels like an essential question for us to figure out how to answer.
  My best guess for now is for reviews to not try to fully answer “does this post check out?” (in cases where that depends on a lot of empirical questions that are hard to check, or where “is this the right ontology?” are hard to check). But, instead, to try to map out “what are the questions I would want answered, that would help me figure out if this post checked out?”
  (Example of this includes Eli Tyre’s “Has there been a memetic collapse?” question, relating to Eliezer’s claims in Local Validity)
  - Bendini 12 Jan 2020 7:58 UTC
    3 points
    Parent
    Often what needs reviewing is less like “author made an unsubstantiated claim or logical error” and more like “is the entire worldview that generated the post, and the connections the post made to the rest of the world, reasonable?
    I agree with this, but given that these posts were popular because lots of people thought they were true and important, deeming the entire worldview of the author flawed would also imply the worldview of the community was flawed as well. It’s certainly possible that the community’s entire worldview is flawed, but even if you believe that to be true, it would be very difficult to explain in a way that people would find believable.
    - Raemon 12 Jan 2020 18:11 UTC
      2 points
      Parent
      [edit: I re-read your comment and mostly retract mine, but am thinking about a new version of it]
  - Mark_Friedenbach 12 Jan 2020 5:28 UTC
    2 points
    Parent
    Have you got authorization from authors/copyright holders to do a book compendium?
    - Raemon 12 Jan 2020 5:37 UTC
      4 points
      Parent
      Everyone will get contacted about inclusion in the book with the opportunity to opt out.
- DanielFilan 12 Jan 2020 2:58 UTC
  4 points
  Parent
  FWIW from a karma perspective I’ve found writing reviews to be significantly more profitable than most comments. IDK how this translates into social prestige though.
  - Bendini 12 Jan 2020 4:34 UTC
    3 points
    Parent
    I’m not surprised to learn that is the case.
    This is my understanding of how karma maps to social prestige:
    People with existing social prestige will be given more karma for a post or a comment than if it was written by someone unknown to the community.
    Posts with more karma tend to be more interesting, which helps boost the author’s prestige because more people will click on a post with higher karma.
    Comments with high karma are viewed as more important.
    Comments with higher karma than other comments in the same thread are viewed as the correct opinion.
    Virtually nobody looks at how much karma you’ve got to figure out how seriously to take your opinions. This is probably because by the time you have accumulated enough for it to mean something, regulars will already associate your username with good content.
- Mark_Friedenbach 12 Jan 2020 0:38 UTC
  4 points
  Parent
  Not all of us agree with the project. I disagree with the entire concept of “pruning” output in this way. I wouldn’t participate on principle.
  - Ruby 12 Jan 2020 4:40 UTC
    2 points
    Parent
    concept of “pruning” output in this way
    I’d be curious to learn the alternative ways you favor, or more detail on why this approach is flawed. Standard academic peer review has its issues, but seemingly a community should have a way it reviews material and determines what’s great, what needs work, and what is plain wrong.
    - Mark_Friedenbach 12 Jan 2020 5:43 UTC
      4 points
      Parent
      Well part of rationality is being able to assess and integrate this information yourself, rather than trusting in the accuracy of curators (which reinforces bad habits IMHO, hence the concern). Things that are useful get referenced, build citations, and are therefore more visible and likely to be found.
      - Ruby 12 Jan 2020 6:36 UTC
        2 points
        Parent
        Do you think there are any ways the 2018 Review as we’ve been doing it could be modified to be better along the dimensions you’re concerned about?
        Mark_Friedenbach 16 Jan 2020 14:01 UTC
        3 points
        Parent
        Sorry if I wasn’t clear: I don’t think it’s a useful thing to do, full stop.
        I don’t mean to rain on anyone’s parade. I was really just replying to the top-level comment which started with:
        The shortage of reviews is both puzzling and concerning...
        I was just pointing out that some people aren’t participating because they don’t find the project worth doing in the first place. To me it’s just noise. I’m not going to get in the way of anyone else, if they want to contribute, but if you’re wondering why there is a shortage of reviews.. well I gave my reasons for not contributing.
        Ruby 17 Jan 2020 6:08 UTC
        3 points
        Parent
        Yeah, true, that seems like a fair reason to point out for why there wouldn’t be more reviews. Thanks for sharing your personal reasons.
      - Ruby 12 Jan 2020 6:11 UTC
        2 points
        Parent
        That makes sense. As I’m won’t to say, there often risks/benefits/costs in each direction.
        
        Ways in which I think communal and collaborative review are imperative:
        Public reviews help establish the standards or reasoning expected in the community.
        By reading other people’s evaluations, you can better learn how to perform your own.
        It’s completely time prohibitive for me to thoroughly review every post that I might reference, instead I trust in the author. Dangerously, many people might do this and a post becomes highly cited despite flaws that would be exposed if a person or two spent several hours evaluating it*
        I might be competent to understand and reference a paper, but lack the domain expertise to review it myself. The review of another domain expert can help me understanding the shortcoming’s of a post.
        
        And as I think has been posted about, having a coordinated “review festival” is ideally an opportunity for people with different opinions about controversial topics to get together and hash it out. In an ideal world, review is the time when the community gets together to resolve what debates it can.
        
        *An example is the work I began auditing the paper Eternity in Six Hours which is tied to the Astronomical Waste argument. Many people reference that argument, but as far as I know, few people have spent much time attempting to systematically evaluate its claims. (I do hope to finish that work and publish more on it sometime.)
- Pattern 12 Jan 2020 3:19 UTC
  1 point
  Parent
  This insight density is what makes them interesting, but it also makes it very hard to evaluate the theories within them.
  So posts should be (pre-)processed for theory/experimentation? (Or distilled?)
- Pattern 12 Jan 2020 3:18 UTC
  1 point
  Parent
  Other possible factors:
  Maybe people read newer posts instead of (re-)reading older posts
  the time of the year (in which reviews occurred)
  the length of time open for reviews
  the set of users reviews are open to
  the set of posts open to review. For example, these are from long ago. (Perhaps if there was a 1 year retrospective, and 2 year, and so on up to 5 years, that could capture engagement earlier, and get ideas for short term and longer term effects.)
  some trivial inconveniences around reading the posts to be reviewed (probably already addressed, but did that affect things a lot?)
  - Raemon 12 Jan 2020 22:53 UTC
    2 points
    Parent
    the set of users reviews are open to
    Note that all users can do review. (It’s only voting, and nomination, that’s restricted to highish karma)