Having been through some of that process… it’s less than stellar.
That recent “creator” paper managed, somehow, to get through peer review and in the past I’ve been acutely aware that it’s been clear that sometimes reviewers have no clue about what they’ve been asked to review and just sort of wave it through with a few requests for spelling and grammar corrections.
To an extent it’s a very similar problem to ones faced in programming and engineering. Asking for more feedback is just the waterfall model applied to research.
To an extent, even if researchers weren’t being asked to publicly post their pre-reg getting them to actually work out what they’re planning to measure is a little like getting programmers to adopt Test Driven Development (write the tests, then write the code) which tends to produce higher quality output.
Despite that 8 years a lot of people still don’t really know what they’re doing in research and just sort of ape their supervisor. (who may have been in the same situation)
Since the system is still half-modeled on the old medieval master-journeyman-apprentice system you can also get massive massive massive variation in ability/competence so simply trusting in people being highly qualified isn’t very reliable.
The simplest way to illustrate the problem is to point to really really basic stats errors which make it into huge portions of the literature. Basic errors which have made it past supervisors, made it past reviewers, made it past editors. Made it past many people with PHD’s and not one picked up on them.
(This is just an example, there are many many other basic errors made constantly in research)
They’ve identified one direct, stark statistical error that is so widespread it appears in about half of all the published papers surveyed from the academic neuroscience research literature.
To understand the scale of this problem, first we have to understand the statistical error they’ve identified. This is slightly difficult, and it will take 400 words of pain. At the end, you will understand an important aspect of statistics better than half the professional university academics currently publishing in the field of neuroscience.
Let’s say you’re working on some nerve cells, measuring the frequency with which they fire. When you drop a chemical on them, they seem to fire more slowly. You’ve got some normal mice, and some mutant mice. You want to see if their cells are differently affected by the chemical. So you measure the firing rate before and after applying the chemical, first in the mutant mice, then in the normal mice.
When you drop the chemical on the mutant mice nerve cells, their firing rate drops, by 30%, say. With the number of mice you have (in your imaginary experiment) this difference is statistically significant, which means it is unlikely to be due to chance. That’s a useful finding which you can maybe publish. When you drop the chemical on the normal mice nerve cells, there is a bit of a drop in firing rate, but not as much – let’s say the drop is 15% – and this smaller drop doesn’t reach statistical significance.
But here is the catch. You can say that there is a statistically significant effect for your chemical reducing the firing rate in the mutant cells. And you can say there is no such statistically significant effect in the normal cells. But you cannot say that mutant cells and mormal cells respond to the chemical differently. To say that, you would have to do a third statistical test, specifically comparing the “difference in differences”, the difference between the chemical-induced change in firing rate for the normal cells against the chemical-induced change in the mutant cells.
Now, looking at the figures I’ve given you here (entirely made up, for our made up experiment) it’s very likely that this “difference in differences” would not be statistically significant, because the responses to the chemical only differ from each other by 15%, and we saw earlier that a drop of 15% on its own wasn’t enough to achieve statistical significance.
But in exactly this situation, academics in neuroscience papers are routinely claiming that they have found a difference in response, in every field imaginable, with all kinds of stimuli and interventions: comparing responses in younger versus older participants; in patients against normal volunteers; in one task against another; between different brain areas; and so on.
How often? Nieuwenhuis looked at 513 papers published in five prestigious neuroscience journals over two years. In half the 157 studies where this error could have been made, it was made.
It makes sense when you realize that many people simply ape their supervisors and the existing literature. When bad methods make it into a paper people copy those methods without ever considering whether they’re obviously incorrect.
I’m arguing we need more feedback rather than more filtering.
You’re arguing the new filtering will be more effective than the old filtering, and as proof, here is all the ways the old filtering method has failed.
But pointing out that filtering didn’t work in the past is not a criticism of my argument that we need more feedback such as through objective post-publication reviews of articles. I never argued that the old filtering method works.
If you believe the old filtering method isn’t a stringent filtering system, do you believe it wouldn’t make much difference if we removed it, and let anybody publish anywhere without peer review as long as they preregistered their study? Would this produce an improvement?
I think you also need to contend with the empirical evidence from COMPare that preregistration (the new filtering method you support) hasn’t been effective so far.
I think more stringent filtering can increase reliability, but doing so will also increase wastefulness. Feedback can increase reliability without increasing wastefulness.
Feedback from supervisors and feedback from reviewers is what the current system is mostly based on. We’re currently in a mostly-feedback system but it’s disorganised, poorly standardised feedback and the feedback tends to end a very short time after publication.
Some of the better journals operate blinded reviews so that in theory “anybody” should be able to publish a paper if the quality is good and that’s a good thing.
COMPare implies that preregistration didn’t solve all the problems but other studies have shown that it has massively improved matters.
Having been through some of that process… it’s less than stellar.
That recent “creator” paper managed, somehow, to get through peer review and in the past I’ve been acutely aware that it’s been clear that sometimes reviewers have no clue about what they’ve been asked to review and just sort of wave it through with a few requests for spelling and grammar corrections.
To an extent it’s a very similar problem to ones faced in programming and engineering. Asking for more feedback is just the waterfall model applied to research.
To an extent, even if researchers weren’t being asked to publicly post their pre-reg getting them to actually work out what they’re planning to measure is a little like getting programmers to adopt Test Driven Development (write the tests, then write the code) which tends to produce higher quality output.
Despite that 8 years a lot of people still don’t really know what they’re doing in research and just sort of ape their supervisor. (who may have been in the same situation)
Since the system is still half-modeled on the old medieval master-journeyman-apprentice system you can also get massive massive massive variation in ability/competence so simply trusting in people being highly qualified isn’t very reliable.
The simplest way to illustrate the problem is to point to really really basic stats errors which make it into huge portions of the literature. Basic errors which have made it past supervisors, made it past reviewers, made it past editors. Made it past many people with PHD’s and not one picked up on them.
(This is just an example, there are many many other basic errors made constantly in research)
http://www.badscience.net/2011/10/what-if-academics-were-as-dumb-as-quacks-with-statistics/
It makes sense when you realize that many people simply ape their supervisors and the existing literature. When bad methods make it into a paper people copy those methods without ever considering whether they’re obviously incorrect.
I’m arguing we need more feedback rather than more filtering.
You’re arguing the new filtering will be more effective than the old filtering, and as proof, here is all the ways the old filtering method has failed.
But pointing out that filtering didn’t work in the past is not a criticism of my argument that we need more feedback such as through objective post-publication reviews of articles. I never argued that the old filtering method works.
If you believe the old filtering method isn’t a stringent filtering system, do you believe it wouldn’t make much difference if we removed it, and let anybody publish anywhere without peer review as long as they preregistered their study? Would this produce an improvement?
I think you also need to contend with the empirical evidence from COMPare that preregistration (the new filtering method you support) hasn’t been effective so far.
I think more stringent filtering can increase reliability, but doing so will also increase wastefulness. Feedback can increase reliability without increasing wastefulness.
Feedback from supervisors and feedback from reviewers is what the current system is mostly based on. We’re currently in a mostly-feedback system but it’s disorganised, poorly standardised feedback and the feedback tends to end a very short time after publication.
Some of the better journals operate blinded reviews so that in theory “anybody” should be able to publish a paper if the quality is good and that’s a good thing.
COMPare implies that preregistration didn’t solve all the problems but other studies have shown that it has massively improved matters.