gwern comments on Open Thread August 31 - September 6

gwern 8 Sep 2015 0:54 UTC
15 points

Is this an “arguments as soldiers” thing? Compare an isomorphic argument: “how did medicine get done for the centuries before antibiotics.”

That’s not isomorphic. To put it bluntly, medicine didn’t. It only started becoming net beneficial extremely recently (and even now tons of medicine is harmful or a pure waste), based on copying a tremendous amount of basic science like biology and bacteriology and benefitting from others’ discoveries, and importing methodology like randomized trials (which it still chafes at) and not by importing peer review. Up until the very late 1800s or so, you would have been better off often ignoring doctors if you were, say, an expecting mother wondering whether to give birth in a hospital pre-Semmelweiss. You can’t expect too much too much help from a field which published its first RCT in 1948 (on, incidentally, an antibiotic).

Leaving aside that this an argument from authority,

I include it as a piquant anecdote since you seem to have no interest in looking up any of the statistical evidence on the unreliability and biases (in the statistical senses) of peer review, or the absence of any especial evidence that it works.

But: “they also laughed at Bozo the Clown.”

That is not what I am saying. I am saying, ‘if you think MIRI is Bozo the Clown, get a photograph of its leader and see if he has a red nose! See if his face is suspiciously white and the entire MIRI staff saves a remarkable amount on gas purchases because they can all fit into one small car to run their errands! Don’t deliberately look away and simply listen for the sound of laughter! That’s a terrible way of deciding!’

Good papers are very likely to get a fair shake and get published.

No, they’re not, or at the very least, you need to modify this to, ‘after being forced to repeatedly try solely thanks to the peer review process, a good paper may still finally be published’. For example, in the NIPS experiment, most accepted papers would not have been accepted given a different committee. Unsurprisingly! given low inter-rater reliabilities for tons of things in psychology far less complicated, and enormous variability when n=1 or 3.

Absolute numbers are kind of useless here. Do you have some work in mind on false positive and false negative rates for peer review?

Yes, any of it. They all say that peer review is not a little but highly stochastic. This isn’t a new field by any means.

I asked you this before, gwern, how much experience with actual peer review (let’s say in applied stats journals, as that is closest to what you do) do you have?

I have little first-hand experience; my vitriol comes mostly from having read over the literature showing peer-review to be highly unreliable, and biased, from the unthinking respect and overestimation of it that most people give it, being shocked at how awful many published studies are despite being ‘peer reviewed’, and from talking to researchers and learning how pervasive bias is in the process and how reviewers enforce particular cliques & theories (some politically-motivated) and try to snuff opposition in the cradle.

The first represents a huge waste of time; the second hinders scientific progress directly and contributes to one of the banes of my existence as a meta-analyst, publication bias (why do we have a ‘grey literature’ in the first place?); the third is seriously annoying in trying to get most people to wake up and think a little about the research they read about (‘but it’s peer-reviwed!’); and the fourth is simply enraging as the issue moves from an abstract, general science-wide problem to something I can directly perceive specifically harming me and my attempts to get accurate beliefs.

(Well, actually I think my analysis of Silk Road 2 listings is supposed to be peer-reviewed, but the lead author is handling the bureaucracy so I can’t say anything directly about how good or bad the reviewers for that journal are, aside from noting that this was a case of problem #4: the paper we were responding too is so egregiously, obviously wrong that the journal’s reviewers must have either been morons or totally ignorant of the paper topic they were supposed to be reviewing. I’m still shocked & baffled about this: how does an apparently respectable journal wind up publishing a paper claiming, essentially, that Silk Road 2 did not sell drugs? This would have been caught in a heartbeat by any kind of remotely public process—even one person who had actually used Silk Road 1 or 2 peeking in on the paper could have laughed it out of the room—but because the journal is ‘peer reviewed’… Pace the Gell-Man Effect, it makes me wonder about all the papers published about topics I am not so knowledgeable about as I am on Silk Road 2 and wonder if I am still not cynical enough.)

I don’t think we disagree here, I think this is a form of peer review. I routinely do this with my papers, and am asked to look over preprints by others. I think this is fine for certain types of papers (generally very specialized or very large/weighty ones).

Yes, I have no objection to ‘peer review’ if by what you mean is all the things I singled out as opposed to, and prior to, and afterwards, the institution of peer review: having colleagues critique your work, having many other people with different perspectives & knowledge check it over and replicate it and build on it and post essays rebutting it—all this is great stuff, we both agree. I would say replication is the most important of those elements, but all have their place.

What I am attacking is the very specific formal institutional practice of journals outsourcing editorial judgment to a few selected researchers and effectively giving them veto power, a process which hardly seems calculated to yield very good results and which does not seem to have been institutionalized because it has been rigorously demonstrated to work far better than the pre-existing alternatives (which of course it wasn’t, any more than medical proposals at that time were routinely put through RCTs first, even though we know how many good-sounding proposals in psychology & sociology & economics & medicine go down in flames when they are rigorously tested), but—to go off on a more speculative tangent here—whose chief purpose was to simply make the bureaucracy of science scale to the post-WWII expansion of science as part of the Cold War/Vannevar Bush academic-military-government complex.

The worry is MIRI’s conception of what a “peer” is basically ignores the wider academic community (which has a lot of intellectual firepower), so they end up in a bubble.

If this is the problem with MIRI, I think there are far more informative ways to criticize them. For example, I don’t think you need to rely on any proxies or filters: you should be able to evaluate their work directly and form your own critique of whether it’s any good or if it seems like a good research avenue for their stated goals.

Honestly, you sound a bit angry about peer review.

Science is srs bsns. (I find it hard to see why other people can’t get worked up over things like publication bias or aging or p-hacking. They’re a lot more important than the latest outrage du jour. This stuff matters!)
- IlyaShpitser 25 Sep 2015 16:27 UTC
  1 point
  Parent
  
  That’s not isomorphic. To put it bluntly, medicine didn’t.
  
  Medicine was often harmful in the past, with some occasional parts that helped, e.g. amputating gangrenous limbs was dangerous and people died, but probably was still a benefit on net. Admiral Nelson had multiple surgeries and was in serious danger of infection and death afterwards, but he would have been a goner for sure without surgery.
  
  Science was pretty similar, it was mostly nonsense with occasional islands of sense. It didn’t really get underway until, what, Francis Bacon wrote about biases and empiricism? That is not very long ago. The early “gentlemen scholars” all did informal peer review by sending their stuff to each other (they also hid discoveries from each other due to competition and egos, but this stuff happens today too).
  
  you seem to have no interest...
  
  Gwern, peer review is my life. My tenure case will be decided by peer review, ultimately. I do peer review myself as a service, constantly. I know all about peer review.
  
  get a photograph of its leader and see if he has a red nose!
  
  The burden of proof is on MIRI, not on me. MIRI is the one that wants funding and people to save the world. It’s up to MIRI to use all available financial and intellectual resources out there, which includes engaging with academia.
  
  I have little first-hand experience; my vitriol comes mostly from having read over the literature showing peer- review to be highly unreliable, and biased, from the unthinking respect and overestimation of it that most people give it, being shocked at how awful many published studies are despite being ‘peer reviewed’, and from talking to researchers and learning how pervasive bias is in the process and how reviewers enforce particular cliques & theories (some politically-motivated) and try to snuff opposition in the cradle.
  
  I really think you should moderate your criticism of peer review. Peer review for data analysis papers is very different from peer review for mathematics or theoretical physics. Fields are different and have vastly different cultural norms. Even in the same field, different conferences/journals may have different norms.
  
  I find it hard to see why other people can’t get worked up over things like publication bias or aging or p-hacking.
  
  I do a lot of theory. When I do data analysis, my collabs and I try to lead by example. What is the point of being angry? Angry outsiders just make people circle the wagons.
  - Vaniver 25 Sep 2015 19:00 UTC
    1 point
    Parent
    
    Admiral Nelson had multiple surgeries and was in serious danger of infection and death afterwards, but he would have been a goner for sure without surgery.
    
    This argument seems exactly identical to the argument for trepanning, even including the survivorship bias. (One of the suspected uses of trepanning was to revive people otherwise thought dead.)
    
    While we’re looking at anecdotes, this bit of Nelson’s experience with surgery seems relevant:
    
    Although surgeons had been unable to remove the central ligature in his amputated arm, which had caused considerable inflammation and poisoning, in early December it came out of its own accord and Nelson rapidly began to recover.
    
    I’m not sure I’d count that as a win for surgery, or evidence that he couldn’t have survived without it!
    
    Gwern, peer review is my life. My tenure case will be decided by peer review, ultimately. I do peer review myself as a service, constantly. I know all about peer review.
    
    But this means that, unless you’re particularly good at distancing yourself from your work, you should expect to be worse at judging it than a disinterested observer. The classic anecdote about “which half?” comes to mind, or the reaction of other obstetricians to Semmelweis’s concerns.
    
    Regardless, we would expect that, if studies are better than anecdotes, studies on peer review will outperform anecdotes on peer review, right?
    - IlyaShpitser 28 Sep 2015 17:26 UTC
      0 points
      Parent
      
      This argument seems exactly identical to the argument for trepanning, even including the survivorship bias.
      
      It’s not identical because we know, with benefit of hindsight, that amputating potentially gangrenous limbs is a good idea. The folks in the past had solid empirical basis for amputations, even if they did not fully understand gangrene. Medicine was mostly, but not always nonsense in the past. A lot of the stuff was not based on the scientific method, because they had no scientific method. But there were isolated communities that came up with sensible things for sensible reasons. This is one case when standard practices were sensible (there are other isolated examples, e.g. honey to disinfect wounds).
      
      But this means that, unless you’re particularly good at distancing yourself from your work, you should expect to be worse at judging it than a disinterested observer.
      
      studies on peer review will outperform anecdotes on peer review, right?
      
      Ok, but isn’t this “incentive tennis?” Gwern’s incentives are clearer than mine here—he’s not a mainstream academic, so he loses out on status. So a “low motive” interpretation of the argument is: “your status castle is built on sand, tear it down!” Gwern is also pretty angry. Are we going to stockpile argument ammunition [X] of the form “you are more biased when evaluating peer review because of [X]”?
      
      For me, peer review is a double edged sword—I get papers rejected sometimes, and at other times I get silly reviewer comments, or editors that make me spend years revising. I have a lot of data both ways. The point with peer review is I sleep better at night due to extra sanity checking. Who sanity-checks MIRI’s whiteboard stuff?
      
      A “low motive” argument for me would be “keep peer review, but have it softball all my papers, they are obviously so amazing why can’t you people see that!”
      
      A “low motive” argument for MIRI would be “look buddy, we are trying to save the world here, we don’t have time for your flawed human institutions. Don’t you worry about our whiteboard content, you probably don’t know enough math to understand it anyways.” MIRI is doing pretty theoretical decision theory. Is that a good idea? Are they producing enough substantive work? In standard academia peer review would help with the former question, and answering to the grant agency and tenure pressure would help with the second. These are not perfect incentives, but they are there. Right now there are absolutely no guard rails in place preventing MIRI from going off the deep end.
      
      Your argument basically says not to trust domain experts, that’s the opposite of what should be done.
      
      Gwern also completely ignores effect modification (e.g. the practice of evaluating conditional effects after conditioning on things like paper topic). Peer review cultures for empirical social science papers and for theoretical physics papers basically have nothing to do with each other.
      - Vaniver 28 Sep 2015 20:50 UTC
        1 point
        Parent
        
        The folks in the past had solid empirical basis for amputations, even if they did not fully understand gangrene.
        
        I would put the start of solid empirical basis for gangrene treatment at Middleton Goldsmith during the American Civil War (dropping mortality from 45% to 3%), about sixty years after Nelson.
        
        This is one case when standard practices were sensible (there are other isolated examples, e.g. honey to disinfect wounds).
        
        I think this is putting too much weight on superficial resemblance. Yes, gangrene treatment from Goldsmith to today involves amputation. But that does not mean amputation pre-Goldsmith actually decreased mortality over no treatment! My priors are pretty strong that it would increase it, but going into details on my priors is perhaps a digression. (The short version is that I take a very Hansonian view of medicine and its efficacy.) I’m not aware of (but would greatly appreciate) any evidence on that question.
        
        (To see where I’m coming from, consider that there is a reference class that contains both “trepanning” and “brian surgery” that seems about as natural as the reference class that includes amputation before and after Goldsmith.)
        
        The point with peer review is I sleep better at night due to extra sanity checking.
        
        But this only makes sense if peer review actually improves the quality of studies. Do you believe that’s the case, and if so, why?
        
        Your argument basically says not to trust domain experts, that’s the opposite of what should be done.
        
        I think my argument is domain expert tennis. That is, I think that in order to evaluate whether or not peer review is effective, we shouldn’t ask scientists who use peer review, we should ask scientists who study peer review. Similarly, in order to determine whether a treatment is effective, we shouldn’t ask the users of the treatment, but statisticians. If you go down to the church/synagogue/mosque, they’ll say that prayer is effective, and they’re obviously the domain experts on prayer. I’m just applying the same principles and same level of skepticism.
        
        Gwern also completely ignores effect modification (e.g. the practice of evaluating conditional effects after conditioning on things like paper topic). Peer review cultures for empirical social science papers and for theoretical physics papers basically have nothing to do with each other.
        
        I am not sure what the relevance of either of these are. If anything, the latter suggests that we need to make the case for peer review field by field, and so proponents have an even harder time than they do without that claim!
        IlyaShpitser 28 Sep 2015 23:20 UTC
        1 point
        Parent
        
        I would put the start of solid empirical basis for gangrene treatment at Middleton Goldsmith during the American Civil War (dropping mortality from 45% to 3%), about sixty years after Nelson.
        
        I think treating gangrene by amputation was well known in the ancient world. Depending on how you deal w/ hemorrhage/complications you would have a pretty high post-surgery mortality rate, but the point is, it is still an improvement on gangrene killing you for sure.
        
        Actually, while I didn’t look into this, I expect Jewish and Greek surgeons would have been pretty good compared to medieval European ones.
        
        But that does not mean amputation pre-Goldsmith actually decreased mortality over no treatment!
        
        I don’t have data from the ancient world :). But mortality from gangrene if you leave the dead tissue in place is what, >95%? Amputation didn’t have to be perfect or even very good, it merely had to do better than an almost certain death sentence.
        
        Do you believe that’s the case, and if so, why?
        
        Well, because peer review would do things like say “your proof has a bug,” “you didn’t cite this important paper,” “this is an exact a very minor modification of [approach].” Peer review in my case is a social institution where smart knowledgeable people read my stuff.
        
        You can say that’s heavily confounded by your field, the types of papers you write (or review), etc., and I agree! But that is of little relevance to gwern, he thinks the whole thing needs to be burned to the ground.
        
        If anything, the latter suggests that we need to make the case for peer review field by field, and so proponents have an even harder time than they do without that claim!
        
        Not following. The claim “peer review sucks for all X,” is stronger than the claim “peer review sucks for some X.” The person making the stronger claim will have a harder time demonstrating it than the person making the weaker claim. So as a status quo defender, I have an easier time attacking the stronger claim.
        Vaniver 29 Sep 2015 1:01 UTC
        1 point
        Parent
        
        I think treating gangrene by amputation was well known in the ancient world. Depending on how you deal w/ hemorrhage/complications you would have a pretty high post-surgery mortality rate, but the point is, it is stlil an improvement on gangrene killing you for sure.
        
        I think you missed the meat of my claim; yes, al-Zharawi said to amputate as a response to gangrene, but that is not a solid empirical basis, and as a result it is not obvious that it actually extended lifespans on net. We don’t have the data to verify, and we don’t have reason to trust their methodology.
        
        Now, maybe gangrene is a case where we can move away from priors on whether archaic surgery was net positive or net negative based on inside view reasoning. I’m not a doctor or a medical historian, and the one place I can think of to look for data (homeopathic treatment of gangrene) doesn’t seem to have any sort of aggregated data, just case reports of survival. Perhaps an actual medical historian could determine it one way or the other, or come up with a better estimate of the survival rate. But my guess is that 95% is a very high estimate.
        
        You can say that’s heavily confounded by your field, the types of papers you write (or review), etc., and I agree!
        
        I could, but why? I’ll simply point out that is not science, and that it’s not even trying to be science. It’s raw good intentions.
        
        But that is of little relevance to gwern, he thinks the whole thing needs to be burned to the ground.
        
        Suppose that the person on the street thinks that price caps on food are a good idea, because it would be morally wrong to gouge on necessities and the poor deserve to be able to afford to eat. Then someone comes along and points out that the frequent queues, or food shortages, or starvation, are a consequence of this policy, regardless of the policy’s intentions.
        
        The person on the street is confused—but food being cheap is a good thing, why is this person so angry about price caps? They’re angry because of the difference between perception of policies and their actual consequences.
        
        So as a status quo defender,
        
        The claim I saw you as making is that peer review’s efficacy in field x is unrelated to its efficacy in field y. If true, that makes it harder for either of us to convince the other in either direction. I, with the null hypothesis that peer review does not add scientific value, would need to be convinced of peer review’s efficacy in every field separately. The situation is symmetric for you: your null hypothesis that peer review adds scientific value would need to be defeated in every field separately.
        
        Now, whether or not our null hypothesis should be efficacy or lack of efficacy is a key component of this whole debate. How would you go about arguing that, say, to someone who believed that prayer caused rain?
        IlyaShpitser 29 Sep 2015 15:21 UTC
        1 point
        Parent
        
        I think you missed the meat of my claim; yes, al-Zharawi said to amputate as a response to gangrene, but that is not a solid empirical basis
        
        Why do you suppose he said this? People didn’t have Bacon’s method, but people had eyes, and accumulated experience. Neolithic people managed, over time, to figure out how all the useful plants in their biome are useful, how did they do it without science? “Science” isn’t this thing that came on a beam of light once Bacon finished his writings. Humans had bits and pieces of science right for a long time (heck, my favorite citation is a two arm nutrition trial in the Book of Daniel in the Old Testament).
        
        Now, maybe gangrene is a case
        
        We can ask a doc, but I am pretty sure post-wound gangrene is basically fatal if untreated.
        
        I’ll simply point out that is not science
        
        What is not science? My direct experience with peer review? “Science” is a method you use to tease things out from a disinterested Nature that hides the mechanism, but spits data at you. If you had direct causal access to a system, you would examine it directly. If I have a computer program on my laptop, I am not going to “do science” to it, I am going to look at it and see what it does.
        
        Note that I am only talking about peer review I am familiar with. I am not making claims about social psychology peer review, because I don’t live in that world. It might be really bad—that’s for social psychologists to worry about. In fact, they are doing a lot of loud soul searching right now: system working as intended. The misdeeds of social psychology don’t really reflect on me or my field, we have our own norms. My only intersection with social psychology is me supplying them with useful mediation methodology sometimes.
        
        I expect gwern’s policy of being really angry on the internet is going to have either a zero effect or a mildly negative effect on the problem.
        
        consequence of this policy
        
        The consequences of peer review for me is, on the receiving end, is generally people improve my paper (and sometimes are picky for silly reasons). The consequences of peer review for me, on the giving side, is I reject shitty papers, and make good and marginal papers better. I don’t need to “do science” to know this, I can just look at my pre-peer review and my post-peer review drafts, for instance. Or I can show you that the paper I rejected had an invalid theorem in it.
        
        The claim I saw you as making is that peer review’s efficacy in field x is unrelated to its efficacy in field y.
        
        I am making the claim that people who want to burn the whole system to the ground need to realize that academia is very large, and has very different social norms in different corners. A unified criticism isn’t really possible. Egregious cases of peer review are not hard to find, but that’s neither here nor there.
        Vaniver 6 Oct 2015 18:52 UTC
        0 points
        Parent
        On the subject of medical advice, Scott and Scurvy reminded me of this conversation.
        Vaniver 29 Sep 2015 21:44 UTC
        0 points
        Parent
        
        Why do you suppose he said this? People didn’t have Bacon’s method, but people had eyes, and accumulated experience.
        
        Sure. I think al-Zharawi got observational evidence, but I think that there are systematic defects in how humans collect data from observation, which makes observational judgments naturally suspect. That is, I’m happy to take “al-Zharawi says X” as a good reason to promote X as a hypothesis worthy of testing, but I am more confident in reality’s entanglement with test results than proposed hypotheses.
        
        “Science” isn’t this thing that came on a beam of light once Bacon finished his writings. Humans had bits and pieces of science right for a long time (heck my favorite citation is a two arm nutrition trial in the Book of Daniel in the Old Testament).
        
        I very much agree that science is some combination of methodology and principles which was gradually discovered by humans, and categorically unlike revealed knowledge, whose core goal is the creation of maps that describe the territory as closely and correctly as possible. (To be clear, science in this view is not ‘having that goal,’ but actions and principles that actually lead to achieving that goal.)
        
        We can ask a doc, but I am pretty sure post-wound gangrene is basically fatal if untreated.
        
        I asked history.stackexchange; we’ll see if that produces anything useful. Asking doctors is also a good idea, but I don’t have as easy an in for that.
        
        What is not science? My direct experience with peer review?
        
        Not quite—what I had in mind as “not science” was confusing your direct experience with peer review and evaluation of the intentions as a scientific case for peer review.
        
        Note that I am only talking about peer review I am familiar with.
        
        Right now, sure, but we got onto this point because you thought not publishing with peer review means we can’t be sure MIRI isn’t wasting donor money, which makes sense primarily if we’re confident in peer review in MIRI’s field.
        
        I expect gwern’s policy of being really angry on the internet is going to have either a zero effect or a mildly negative effect on the problem.
        
        Eh. While I agree that being angry on the internet is unsightly, it’s not obvious to me that it’s ineffective at accomplishing useful goals.
        
        I am making the claim that people who want to burn the whole system to the ground need to realize that academia is very large, and has very different social norms in different corners.
        
        “Whole system” seems unclear. It’s pretty obvious to me that gwern wants to kill a specific element for solid reasons, as evidenced by the following quotes:
        
        What makes science work is replication and putting your work out there for community evaluation. Those are the real review by peers. …
        
        Yes, I have no objection to ‘peer review’ if by what you mean is all the things I singled out as opposed to, and prior to, and afterwards, the institution of peer review: having colleagues critique your work, having many other people with different perspectives & knowledge check it over and replicate it and build on it and post essays rebutting it—all this is great stuff, we both agree. I would say replication is the most important of those elements, but all have their place.
        
        What I am attacking is the very specific formal institutional practice of journals outsourcing editorial judgment to a few selected researchers and effectively giving them veto power, a process which hardly seems calculated to yield very good results and which does not seem to have been institutionalized because it has been rigorously demonstrated to work far better than the pre-existing alternatives
        
        Lumifer 29 Sep 2015 17:45 UTC
        0 points
        Parent
        
        I am making the claim that people who want to burn the whole system to the ground need to realize that academia is very large, and has very different social norms in different corners. A unified criticism isn’t really possible.
        
        Would you agree that some parts of the system should be burned to the ground?