Admiral Nelson had multiple surgeries and was in serious danger of infection and death afterwards, but he would have been a goner for sure without surgery.
This argument seems exactly identical to the argument for trepanning, even including the survivorship bias. (One of the suspected uses of trepanning was to revive people otherwise thought dead.)
While we’re looking at anecdotes, this bit of Nelson’s experience with surgery seems relevant:
Although surgeons had been unable to remove the central ligature in his amputated arm, which had caused considerable inflammation and poisoning, in early December it came out of its own accord and Nelson rapidly began to recover.
I’m not sure I’d count that as a win for surgery, or evidence that he couldn’t have survived without it!
Gwern, peer review is my life. My tenure case will be decided by peer review, ultimately. I do peer review myself as a service, constantly. I know all about peer review.
But this means that, unless you’re particularly good at distancing yourself from your work, you should expect to be worse at judging it than a disinterested observer. The classic anecdote about “which half?” comes to mind, or the reaction of other obstetricians to Semmelweis’s concerns.
Regardless, we would expect that, if studies are better than anecdotes, studies on peer review will outperform anecdotes on peer review, right?
This argument seems exactly identical to the argument for trepanning, even including the survivorship bias.
It’s not identical because we know, with benefit of hindsight, that amputating potentially gangrenous limbs is a good idea. The folks in the past had solid empirical basis for amputations, even if they did not fully understand gangrene. Medicine was mostly, but not always nonsense in the past. A lot of the stuff was not based on the scientific method, because they had no scientific method. But there were isolated communities that came up with sensible things for sensible reasons. This is one case when standard practices were sensible (there are other isolated examples, e.g. honey to disinfect wounds).
But this means that, unless you’re particularly good at distancing yourself from your work, you should expect
to be worse at judging it than a disinterested observer.
studies on peer review will outperform anecdotes on peer review, right?
Ok, but isn’t this “incentive tennis?” Gwern’s incentives are clearer than mine here—he’s not a mainstream academic, so he loses out on status. So a “low motive” interpretation of the argument is: “your status castle is built on sand, tear it down!” Gwern is also pretty angry. Are we going to stockpile argument ammunition [X] of the form “you are more biased when evaluating peer review because of [X]”?
For me, peer review is a double edged sword—I get papers rejected sometimes, and at other times I get silly reviewer comments, or editors that make me spend years revising. I have a lot of data both ways. The point with peer review is I sleep better at night due to extra sanity checking. Who sanity-checks MIRI’s whiteboard stuff?
A “low motive” argument for me would be “keep peer review, but have it softball all my papers, they are obviously so amazing why can’t you people see that!”
A “low motive” argument for MIRI would be “look buddy, we are trying to save the world here, we don’t have time for your flawed human institutions. Don’t you worry about our whiteboard content, you probably don’t know enough math to understand it anyways.” MIRI is doing pretty theoretical decision theory. Is that a good idea? Are they producing enough substantive work? In standard academia peer review would help with the former question, and answering to the grant agency and tenure pressure would help with the second. These are not perfect incentives, but they are there. Right now there are absolutely no guard rails in place preventing MIRI from going off the deep end.
Your argument basically says not to trust domain experts, that’s the opposite of what should be done.
Gwern also completely ignores effect modification (e.g. the practice of evaluating conditional effects after conditioning on things like paper topic). Peer review cultures for empirical social science papers and for theoretical physics papers basically have nothing to do with each other.
The folks in the past had solid empirical basis for amputations, even if they did not fully understand gangrene.
I would put the start of solid empirical basis for gangrene treatment at Middleton Goldsmith during the American Civil War (dropping mortality from 45% to 3%), about sixty years after Nelson.
This is one case when standard practices were sensible (there are other isolated examples, e.g. honey to disinfect wounds).
I think this is putting too much weight on superficial resemblance. Yes, gangrene treatment from Goldsmith to today involves amputation. But that does not mean amputation pre-Goldsmith actually decreased mortality over no treatment! My priors are pretty strong that it would increase it, but going into details on my priors is perhaps a digression. (The short version is that I take a very Hansonian view of medicine and its efficacy.) I’m not aware of (but would greatly appreciate) any evidence on that question.
(To see where I’m coming from, consider that there is a reference class that contains both “trepanning” and “brian surgery” that seems about as natural as the reference class that includes amputation before and after Goldsmith.)
The point with peer review is I sleep better at night due to extra sanity checking.
But this only makes sense if peer review actually improves the quality of studies. Do you believe that’s the case, and if so, why?
Your argument basically says not to trust domain experts, that’s the opposite of what should be done.
I think my argument is domain expert tennis. That is, I think that in order to evaluate whether or not peer review is effective, we shouldn’t ask scientists who use peer review, we should ask scientists who study peer review. Similarly, in order to determine whether a treatment is effective, we shouldn’t ask the users of the treatment, but statisticians. If you go down to the church/synagogue/mosque, they’ll say that prayer is effective, and they’re obviously the domain experts on prayer. I’m just applying the same principles and same level of skepticism.
Gwern also completely ignores effect modification (e.g. the practice of evaluating conditional effects after conditioning on things like paper topic). Peer review cultures for empirical social science papers and for theoretical physics papers basically have nothing to do with each other.
I am not sure what the relevance of either of these are. If anything, the latter suggests that we need to make the case for peer review field by field, and so proponents have an even harder time than they do without that claim!
I would put the start of solid empirical basis for gangrene treatment at Middleton Goldsmith during the American
Civil War (dropping mortality from 45% to 3%), about sixty years after Nelson.
I think treating gangrene by amputation was well known in the ancient world. Depending on how you deal w/ hemorrhage/complications you would have a pretty high post-surgery mortality rate, but the point is, it is still an improvement on gangrene killing you for sure.
Actually, while I didn’t look into this, I expect Jewish and Greek surgeons would have been pretty good compared to medieval European ones.
But that does not mean amputation pre-Goldsmith actually decreased mortality over no treatment!
I don’t have data from the ancient world :). But mortality from gangrene if you leave the dead tissue in place is what, >95%? Amputation didn’t have to be perfect or even very good, it merely had to do better than an almost certain death sentence.
Do you believe that’s the case, and if so, why?
Well, because peer review would do things like say “your proof has a bug,” “you didn’t cite this important paper,” “this is an exact a very minor modification of [approach].” Peer review in my case is a social institution where smart knowledgeable people read my stuff.
You can say that’s heavily confounded by your field, the types of papers you write (or review), etc., and I agree! But that is of little relevance to gwern, he thinks the whole thing needs to be burned to the ground.
If anything, the latter suggests that we need to make the case for peer review field by field, and so proponents
have an even harder time than they do without that claim!
Not following. The claim “peer review sucks for all X,” is stronger than the claim “peer review sucks for some X.” The person making the stronger claim will have a harder time demonstrating it than the person making the weaker claim. So as a status quo defender, I have an easier time attacking the stronger claim.
I think treating gangrene by amputation was well known in the ancient world. Depending on how you deal w/ hemorrhage/complications you would have a pretty high post-surgery mortality rate, but the point is, it is stlil an improvement on gangrene killing you for sure.
I think you missed the meat of my claim; yes, al-Zharawi said to amputate as a response to gangrene, but that is not a solid empirical basis, and as a result it is not obvious that it actually extended lifespans on net. We don’t have the data to verify, and we don’t have reason to trust their methodology.
Now, maybe gangrene is a case where we can move away from priors on whether archaic surgery was net positive or net negative based on inside view reasoning. I’m not a doctor or a medical historian, and the one place I can think of to look for data (homeopathic treatment of gangrene) doesn’t seem to have any sort of aggregated data, just case reports of survival. Perhaps an actual medical historian could determine it one way or the other, or come up with a better estimate of the survival rate. But my guess is that 95% is a very high estimate.
You can say that’s heavily confounded by your field, the types of papers you write (or review), etc., and I agree!
I could, but why? I’ll simply point out that is not science, and that it’s not even trying to be science. It’s raw good intentions.
But that is of little relevance to gwern, he thinks the whole thing needs to be burned to the ground.
Suppose that the person on the street thinks that price caps on food are a good idea, because it would be morally wrong to gouge on necessities and the poor deserve to be able to afford to eat. Then someone comes along and points out that the frequent queues, or food shortages, or starvation, are a consequence of this policy, regardless of the policy’s intentions.
The person on the street is confused—but food being cheap is a good thing, why is this person so angry about price caps? They’re angry because of the difference between perception of policies and their actual consequences.
So as a status quo defender,
The claim I saw you as making is that peer review’s efficacy in field x is unrelated to its efficacy in field y. If true, that makes it harder for either of us to convince the other in either direction. I, with the null hypothesis that peer review does not add scientific value, would need to be convinced of peer review’s efficacy in every field separately. The situation is symmetric for you: your null hypothesis that peer review adds scientific value would need to be defeated in every field separately.
Now, whether or not our null hypothesis should be efficacy or lack of efficacy is a key component of this whole debate. How would you go about arguing that, say, to someone who believed that prayer caused rain?
I think you missed the meat of my claim; yes, al-Zharawi said to amputate as a response to gangrene, but
that is not a solid empirical basis
Why do you suppose he said this? People didn’t have Bacon’s method, but people had eyes, and accumulated experience. Neolithic people managed, over time, to figure out how all the useful plants in their biome are useful, how did they do it without science? “Science” isn’t this thing that came on a beam of light once Bacon finished his writings. Humans had bits and pieces of science right for a long time (heck, my favorite citation is a two arm nutrition trial in the Book of Daniel in the Old Testament).
Now, maybe gangrene is a case
We can ask a doc, but I am pretty sure post-wound gangrene is basically fatal if untreated.
I’ll simply point out that is not science
What is not science? My direct experience with peer review? “Science” is a method you use to tease things out from a disinterested Nature that hides the mechanism, but spits data at you. If you had direct causal access to a system, you would examine it directly. If I have a computer program on my laptop, I am not going to “do science” to it, I am going to look at it and see what it does.
Note that I am only talking about peer review I am familiar with. I am not making claims about social psychology peer review, because I don’t live in that world. It might be really bad—that’s for social psychologists to worry about. In fact, they are doing a lot of loud soul searching right now: system working as intended. The misdeeds of social psychology don’t really reflect on me or my field, we have our own norms. My only intersection with social psychology is me supplying them with useful mediation methodology sometimes.
I expect gwern’s policy of being really angry on the internet is going to have either a zero effect or a mildly negative effect on the problem.
consequence of this policy
The consequences of peer review for me is, on the receiving end, is generally people improve my paper (and sometimes are picky for silly reasons). The consequences of peer review for me, on the giving side, is I reject shitty papers, and make good and marginal papers better. I don’t need to “do science” to know this, I can just look at my pre-peer review and my post-peer review drafts, for instance. Or I can show you that the paper I rejected had an invalid theorem in it.
The claim I saw you as making is that peer review’s efficacy in field x is unrelated to its efficacy in field y.
I am making the claim that people who want to burn the whole system to the ground need to realize that academia is very large, and has very different social norms in different corners. A unified criticism isn’t really possible. Egregious cases of peer review are not hard to find, but that’s neither here nor there.
Why do you suppose he said this? People didn’t have Bacon’s method, but people had eyes, and accumulated experience.
Sure. I think al-Zharawi got observational evidence, but I think that there are systematic defects in how humans collect data from observation, which makes observational judgments naturally suspect. That is, I’m happy to take “al-Zharawi says X” as a good reason to promote X as a hypothesis worthy of testing, but I am more confident in reality’s entanglement with test results than proposed hypotheses.
“Science” isn’t this thing that came on a beam of light once Bacon finished his writings. Humans had bits and pieces of science right for a long time (heck my favorite citation is a two arm nutrition trial in the Book of Daniel in the Old Testament).
I very much agree that science is some combination of methodology and principles which was gradually discovered by humans, and categorically unlike revealed knowledge, whose core goal is the creation of maps that describe the territory as closely and correctly as possible. (To be clear, science in this view is not ‘having that goal,’ but actions and principles that actually lead to achieving that goal.)
We can ask a doc, but I am pretty sure post-wound gangrene is basically fatal if untreated.
I asked history.stackexchange; we’ll see if that produces anything useful. Asking doctors is also a good idea, but I don’t have as easy an in for that.
What is not science? My direct experience with peer review?
Not quite—what I had in mind as “not science” was confusing your direct experience with peer review and evaluation of the intentions as a scientific case for peer review.
Note that I am only talking about peer review I am familiar with.
Right now, sure, but we got onto this point because you thought not publishing with peer review means we can’t be sure MIRI isn’t wasting donor money, which makes sense primarily if we’re confident in peer review in MIRI’s field.
I expect gwern’s policy of being really angry on the internet is going to have either a zero effect or a mildly negative effect on the problem.
Eh. While I agree that being angry on the internet is unsightly, it’s not obvious to me that it’s ineffective at accomplishing useful goals.
I am making the claim that people who want to burn the whole system to the ground need to realize that academia is very large, and has very different social norms in different corners.
“Whole system” seems unclear. It’s pretty obvious to me that gwern wants to kill a specific element for solid reasons, as evidenced by the following quotes:
What makes science work is replication and putting your work out there for community evaluation. Those are the real review by peers. …
Yes, I have no objection to ‘peer review’ if by what you mean is all the things I singled out as opposed to, and prior to, and afterwards, the institution of peer review: having colleagues critique your work, having many other people with different perspectives & knowledge check it over and replicate it and build on it and post essays rebutting it—all this is great stuff, we both agree. I would say replication is the most important of those elements, but all have their place.
What I am attacking is the very specific formal institutional practice of journals outsourcing editorial judgment to a few selected researchers and effectively giving them veto power, a process which hardly seems calculated to yield very good results and which does not seem to have been institutionalized because it has been rigorously demonstrated to work far better than the pre-existing alternatives
I am making the claim that people who want to burn the whole system to the ground need to realize that academia is very large, and has very different social norms in different corners. A unified criticism isn’t really possible.
Would you agree that some parts of the system should be burned to the ground?
This argument seems exactly identical to the argument for trepanning, even including the survivorship bias. (One of the suspected uses of trepanning was to revive people otherwise thought dead.)
While we’re looking at anecdotes, this bit of Nelson’s experience with surgery seems relevant:
I’m not sure I’d count that as a win for surgery, or evidence that he couldn’t have survived without it!
But this means that, unless you’re particularly good at distancing yourself from your work, you should expect to be worse at judging it than a disinterested observer. The classic anecdote about “which half?” comes to mind, or the reaction of other obstetricians to Semmelweis’s concerns.
Regardless, we would expect that, if studies are better than anecdotes, studies on peer review will outperform anecdotes on peer review, right?
It’s not identical because we know, with benefit of hindsight, that amputating potentially gangrenous limbs is a good idea. The folks in the past had solid empirical basis for amputations, even if they did not fully understand gangrene. Medicine was mostly, but not always nonsense in the past. A lot of the stuff was not based on the scientific method, because they had no scientific method. But there were isolated communities that came up with sensible things for sensible reasons. This is one case when standard practices were sensible (there are other isolated examples, e.g. honey to disinfect wounds).
Ok, but isn’t this “incentive tennis?” Gwern’s incentives are clearer than mine here—he’s not a mainstream academic, so he loses out on status. So a “low motive” interpretation of the argument is: “your status castle is built on sand, tear it down!” Gwern is also pretty angry. Are we going to stockpile argument ammunition [X] of the form “you are more biased when evaluating peer review because of [X]”?
For me, peer review is a double edged sword—I get papers rejected sometimes, and at other times I get silly reviewer comments, or editors that make me spend years revising. I have a lot of data both ways. The point with peer review is I sleep better at night due to extra sanity checking. Who sanity-checks MIRI’s whiteboard stuff?
A “low motive” argument for me would be “keep peer review, but have it softball all my papers, they are obviously so amazing why can’t you people see that!”
A “low motive” argument for MIRI would be “look buddy, we are trying to save the world here, we don’t have time for your flawed human institutions. Don’t you worry about our whiteboard content, you probably don’t know enough math to understand it anyways.” MIRI is doing pretty theoretical decision theory. Is that a good idea? Are they producing enough substantive work? In standard academia peer review would help with the former question, and answering to the grant agency and tenure pressure would help with the second. These are not perfect incentives, but they are there. Right now there are absolutely no guard rails in place preventing MIRI from going off the deep end.
Your argument basically says not to trust domain experts, that’s the opposite of what should be done.
Gwern also completely ignores effect modification (e.g. the practice of evaluating conditional effects after conditioning on things like paper topic). Peer review cultures for empirical social science papers and for theoretical physics papers basically have nothing to do with each other.
I would put the start of solid empirical basis for gangrene treatment at Middleton Goldsmith during the American Civil War (dropping mortality from 45% to 3%), about sixty years after Nelson.
I think this is putting too much weight on superficial resemblance. Yes, gangrene treatment from Goldsmith to today involves amputation. But that does not mean amputation pre-Goldsmith actually decreased mortality over no treatment! My priors are pretty strong that it would increase it, but going into details on my priors is perhaps a digression. (The short version is that I take a very Hansonian view of medicine and its efficacy.) I’m not aware of (but would greatly appreciate) any evidence on that question.
(To see where I’m coming from, consider that there is a reference class that contains both “trepanning” and “brian surgery” that seems about as natural as the reference class that includes amputation before and after Goldsmith.)
But this only makes sense if peer review actually improves the quality of studies. Do you believe that’s the case, and if so, why?
I think my argument is domain expert tennis. That is, I think that in order to evaluate whether or not peer review is effective, we shouldn’t ask scientists who use peer review, we should ask scientists who study peer review. Similarly, in order to determine whether a treatment is effective, we shouldn’t ask the users of the treatment, but statisticians. If you go down to the church/synagogue/mosque, they’ll say that prayer is effective, and they’re obviously the domain experts on prayer. I’m just applying the same principles and same level of skepticism.
I am not sure what the relevance of either of these are. If anything, the latter suggests that we need to make the case for peer review field by field, and so proponents have an even harder time than they do without that claim!
I think treating gangrene by amputation was well known in the ancient world. Depending on how you deal w/ hemorrhage/complications you would have a pretty high post-surgery mortality rate, but the point is, it is still an improvement on gangrene killing you for sure.
Actually, while I didn’t look into this, I expect Jewish and Greek surgeons would have been pretty good compared to medieval European ones.
I don’t have data from the ancient world :). But mortality from gangrene if you leave the dead tissue in place is what, >95%? Amputation didn’t have to be perfect or even very good, it merely had to do better than an almost certain death sentence.
Well, because peer review would do things like say “your proof has a bug,” “you didn’t cite this important paper,” “this is an exact a very minor modification of [approach].” Peer review in my case is a social institution where smart knowledgeable people read my stuff.
You can say that’s heavily confounded by your field, the types of papers you write (or review), etc., and I agree! But that is of little relevance to gwern, he thinks the whole thing needs to be burned to the ground.
Not following. The claim “peer review sucks for all X,” is stronger than the claim “peer review sucks for some X.” The person making the stronger claim will have a harder time demonstrating it than the person making the weaker claim. So as a status quo defender, I have an easier time attacking the stronger claim.
I think you missed the meat of my claim; yes, al-Zharawi said to amputate as a response to gangrene, but that is not a solid empirical basis, and as a result it is not obvious that it actually extended lifespans on net. We don’t have the data to verify, and we don’t have reason to trust their methodology.
Now, maybe gangrene is a case where we can move away from priors on whether archaic surgery was net positive or net negative based on inside view reasoning. I’m not a doctor or a medical historian, and the one place I can think of to look for data (homeopathic treatment of gangrene) doesn’t seem to have any sort of aggregated data, just case reports of survival. Perhaps an actual medical historian could determine it one way or the other, or come up with a better estimate of the survival rate. But my guess is that 95% is a very high estimate.
I could, but why? I’ll simply point out that is not science, and that it’s not even trying to be science. It’s raw good intentions.
Suppose that the person on the street thinks that price caps on food are a good idea, because it would be morally wrong to gouge on necessities and the poor deserve to be able to afford to eat. Then someone comes along and points out that the frequent queues, or food shortages, or starvation, are a consequence of this policy, regardless of the policy’s intentions.
The person on the street is confused—but food being cheap is a good thing, why is this person so angry about price caps? They’re angry because of the difference between perception of policies and their actual consequences.
The claim I saw you as making is that peer review’s efficacy in field x is unrelated to its efficacy in field y. If true, that makes it harder for either of us to convince the other in either direction. I, with the null hypothesis that peer review does not add scientific value, would need to be convinced of peer review’s efficacy in every field separately. The situation is symmetric for you: your null hypothesis that peer review adds scientific value would need to be defeated in every field separately.
Now, whether or not our null hypothesis should be efficacy or lack of efficacy is a key component of this whole debate. How would you go about arguing that, say, to someone who believed that prayer caused rain?
Why do you suppose he said this? People didn’t have Bacon’s method, but people had eyes, and accumulated experience. Neolithic people managed, over time, to figure out how all the useful plants in their biome are useful, how did they do it without science? “Science” isn’t this thing that came on a beam of light once Bacon finished his writings. Humans had bits and pieces of science right for a long time (heck, my favorite citation is a two arm nutrition trial in the Book of Daniel in the Old Testament).
We can ask a doc, but I am pretty sure post-wound gangrene is basically fatal if untreated.
What is not science? My direct experience with peer review? “Science” is a method you use to tease things out from a disinterested Nature that hides the mechanism, but spits data at you. If you had direct causal access to a system, you would examine it directly. If I have a computer program on my laptop, I am not going to “do science” to it, I am going to look at it and see what it does.
Note that I am only talking about peer review I am familiar with. I am not making claims about social psychology peer review, because I don’t live in that world. It might be really bad—that’s for social psychologists to worry about. In fact, they are doing a lot of loud soul searching right now: system working as intended. The misdeeds of social psychology don’t really reflect on me or my field, we have our own norms. My only intersection with social psychology is me supplying them with useful mediation methodology sometimes.
I expect gwern’s policy of being really angry on the internet is going to have either a zero effect or a mildly negative effect on the problem.
The consequences of peer review for me is, on the receiving end, is generally people improve my paper (and sometimes are picky for silly reasons). The consequences of peer review for me, on the giving side, is I reject shitty papers, and make good and marginal papers better. I don’t need to “do science” to know this, I can just look at my pre-peer review and my post-peer review drafts, for instance. Or I can show you that the paper I rejected had an invalid theorem in it.
I am making the claim that people who want to burn the whole system to the ground need to realize that academia is very large, and has very different social norms in different corners. A unified criticism isn’t really possible. Egregious cases of peer review are not hard to find, but that’s neither here nor there.
On the subject of medical advice, Scott and Scurvy reminded me of this conversation.
Sure. I think al-Zharawi got observational evidence, but I think that there are systematic defects in how humans collect data from observation, which makes observational judgments naturally suspect. That is, I’m happy to take “al-Zharawi says X” as a good reason to promote X as a hypothesis worthy of testing, but I am more confident in reality’s entanglement with test results than proposed hypotheses.
I very much agree that science is some combination of methodology and principles which was gradually discovered by humans, and categorically unlike revealed knowledge, whose core goal is the creation of maps that describe the territory as closely and correctly as possible. (To be clear, science in this view is not ‘having that goal,’ but actions and principles that actually lead to achieving that goal.)
I asked history.stackexchange; we’ll see if that produces anything useful. Asking doctors is also a good idea, but I don’t have as easy an in for that.
Not quite—what I had in mind as “not science” was confusing your direct experience with peer review and evaluation of the intentions as a scientific case for peer review.
Right now, sure, but we got onto this point because you thought not publishing with peer review means we can’t be sure MIRI isn’t wasting donor money, which makes sense primarily if we’re confident in peer review in MIRI’s field.
Eh. While I agree that being angry on the internet is unsightly, it’s not obvious to me that it’s ineffective at accomplishing useful goals.
“Whole system” seems unclear. It’s pretty obvious to me that gwern wants to kill a specific element for solid reasons, as evidenced by the following quotes:
Would you agree that some parts of the system should be burned to the ground?