I would regard projects like COMPare, which rate studies after publication, as much more valuable than preregistration. Yes, preregistration reduces researcher degrees of freedom, but it also increases red tape. Ioannidis mentions how researchers are spending too much time chasing funds. Preregistration increases costs (in terms of extra work) to the researcher; encouraging them to chase more funding. Increasing quality will likely require reducing the cost of doing higher quality research; not increasing it. Yes, I’m aware COMPare is using the preregistration to rate the studies, but that’s just one method. The question mark for me with preregistration is: what is the opportunity cost? If researchers are now spending this extra time figuring out exactly what they plan to do all from the beginning of the study, and then filling out preregistration forms, what are they not doing instead?
If you don’t publicly pre-commit to what you’re going to measure then p-values become a bit meaningless since nobody can know if that was the only thing you measured.
If researchers are well organized then pre-reg should be almost free. On the other hand if they’re disorganized, winging it and making things up as they go along then pre-reg will look like a terrible burden since it forces them to decide what they’re actually going to do.
The short general version of my argument is: feedback > filtering
I would agree that preregistration is one way to make p-values more useful. They may be the best way to determine what the researcher originally intended to measure, but they’re not the only way to know if that was the only thing a researcher measured. I’ve found asking questions often works.
If we’re talking strictly about properly run RCTs, then I would agree, preregistration is close to free relatively speaking. But that’s because a properly conducted RCT is such a big undertaking that most filtering initiatives are going to be relatively small. But RCTs aren’t the only kind of study design out there. They are the gold-standard, yes, in that they have the greatest robustness, but their major drawback is that they’re expensive to conduct properly relative to alternatives.
Science already has a pretty strong filter. Researchers need to spend 8 years (and usually much more than that) after high school working towards a PhD. They then have to decide that what they’re doing is the best way to analyze a problem, or if they’re still in grad school, their professor has to approve it, and they have to believe in it. Then two or more other people with PhDs who weren’t involved in the research (editor and peer reviewer(s)) have to review what the researcher did, and come to the conclusion that the research was properly conducted. I don’t view this as principally a filtering problem. Filtering can improve quality, but it also reduces the number of possible ways to conduct research. The end result of excessive filtering to me is that everybody ends up just doing RCTs for everything, which is extremely cost-inefficient, and leads to the problem of everybody chasing funding. If nobody with less than a million on their credit can conduct a study, I think that’s a problem.
Having been through some of that process… it’s less than stellar.
That recent “creator” paper managed, somehow, to get through peer review and in the past I’ve been acutely aware that it’s been clear that sometimes reviewers have no clue about what they’ve been asked to review and just sort of wave it through with a few requests for spelling and grammar corrections.
To an extent it’s a very similar problem to ones faced in programming and engineering. Asking for more feedback is just the waterfall model applied to research.
To an extent, even if researchers weren’t being asked to publicly post their pre-reg getting them to actually work out what they’re planning to measure is a little like getting programmers to adopt Test Driven Development (write the tests, then write the code) which tends to produce higher quality output.
Despite that 8 years a lot of people still don’t really know what they’re doing in research and just sort of ape their supervisor. (who may have been in the same situation)
Since the system is still half-modeled on the old medieval master-journeyman-apprentice system you can also get massive massive massive variation in ability/competence so simply trusting in people being highly qualified isn’t very reliable.
The simplest way to illustrate the problem is to point to really really basic stats errors which make it into huge portions of the literature. Basic errors which have made it past supervisors, made it past reviewers, made it past editors. Made it past many people with PHD’s and not one picked up on them.
(This is just an example, there are many many other basic errors made constantly in research)
They’ve identified one direct, stark statistical error that is so widespread it appears in about half of all the published papers surveyed from the academic neuroscience research literature.
To understand the scale of this problem, first we have to understand the statistical error they’ve identified. This is slightly difficult, and it will take 400 words of pain. At the end, you will understand an important aspect of statistics better than half the professional university academics currently publishing in the field of neuroscience.
Let’s say you’re working on some nerve cells, measuring the frequency with which they fire. When you drop a chemical on them, they seem to fire more slowly. You’ve got some normal mice, and some mutant mice. You want to see if their cells are differently affected by the chemical. So you measure the firing rate before and after applying the chemical, first in the mutant mice, then in the normal mice.
When you drop the chemical on the mutant mice nerve cells, their firing rate drops, by 30%, say. With the number of mice you have (in your imaginary experiment) this difference is statistically significant, which means it is unlikely to be due to chance. That’s a useful finding which you can maybe publish. When you drop the chemical on the normal mice nerve cells, there is a bit of a drop in firing rate, but not as much – let’s say the drop is 15% – and this smaller drop doesn’t reach statistical significance.
But here is the catch. You can say that there is a statistically significant effect for your chemical reducing the firing rate in the mutant cells. And you can say there is no such statistically significant effect in the normal cells. But you cannot say that mutant cells and mormal cells respond to the chemical differently. To say that, you would have to do a third statistical test, specifically comparing the “difference in differences”, the difference between the chemical-induced change in firing rate for the normal cells against the chemical-induced change in the mutant cells.
Now, looking at the figures I’ve given you here (entirely made up, for our made up experiment) it’s very likely that this “difference in differences” would not be statistically significant, because the responses to the chemical only differ from each other by 15%, and we saw earlier that a drop of 15% on its own wasn’t enough to achieve statistical significance.
But in exactly this situation, academics in neuroscience papers are routinely claiming that they have found a difference in response, in every field imaginable, with all kinds of stimuli and interventions: comparing responses in younger versus older participants; in patients against normal volunteers; in one task against another; between different brain areas; and so on.
How often? Nieuwenhuis looked at 513 papers published in five prestigious neuroscience journals over two years. In half the 157 studies where this error could have been made, it was made.
It makes sense when you realize that many people simply ape their supervisors and the existing literature. When bad methods make it into a paper people copy those methods without ever considering whether they’re obviously incorrect.
I’m arguing we need more feedback rather than more filtering.
You’re arguing the new filtering will be more effective than the old filtering, and as proof, here is all the ways the old filtering method has failed.
But pointing out that filtering didn’t work in the past is not a criticism of my argument that we need more feedback such as through objective post-publication reviews of articles. I never argued that the old filtering method works.
If you believe the old filtering method isn’t a stringent filtering system, do you believe it wouldn’t make much difference if we removed it, and let anybody publish anywhere without peer review as long as they preregistered their study? Would this produce an improvement?
I think you also need to contend with the empirical evidence from COMPare that preregistration (the new filtering method you support) hasn’t been effective so far.
I think more stringent filtering can increase reliability, but doing so will also increase wastefulness. Feedback can increase reliability without increasing wastefulness.
Feedback from supervisors and feedback from reviewers is what the current system is mostly based on. We’re currently in a mostly-feedback system but it’s disorganised, poorly standardised feedback and the feedback tends to end a very short time after publication.
Some of the better journals operate blinded reviews so that in theory “anybody” should be able to publish a paper if the quality is good and that’s a good thing.
COMPare implies that preregistration didn’t solve all the problems but other studies have shown that it has massively improved matters.
If that’s true, why are replication rates so poor?
They may be the best way to determine what the researcher originally intended to measure, but they’re not the only way to know if that was the only thing a researcher measured. I’ve found asking questions often works.
You can ask questions but how do you know whether the answers you are getting are right? It’s quite easy for people who fit a linear model to play a bit around with the parameters and not even remember all parameters they tested.
Then two or more other people with PhDs who weren’t involved in the research (editor and peer reviewer(s)) have to review what the researcher did
More often they don’t review what the researcher did but what the researchers claimed they did.
If that’s true, why are replication rates so poor?
There is no feedback post publication. Researchers are expected to individually decide on the quality of a published study, or occasionally ask the colleagues in their department.
I don’t get the impression that low replication rates are due to malice generally. I think it’s a training and incentive problem most of the time. In that case just asking should often work.
Science has very little feedback and lots of filtering at present. Preregistration is just more filtering. Science needs more feedback.
If researchers are now spending this extra time figuring out exactly what they plan to do all from the beginning of the study, and then filling out preregistration forms, what are they not doing instead?
Spending time on using a lot of different statistical techniques till one of them provides statistical significant restuls?
I would regard projects like COMPare, which rate studies after publication, as much more valuable than preregistration. Yes, preregistration reduces researcher degrees of freedom, but it also increases red tape. Ioannidis mentions how researchers are spending too much time chasing funds. Preregistration increases costs (in terms of extra work) to the researcher; encouraging them to chase more funding. Increasing quality will likely require reducing the cost of doing higher quality research; not increasing it. Yes, I’m aware COMPare is using the preregistration to rate the studies, but that’s just one method. The question mark for me with preregistration is: what is the opportunity cost? If researchers are now spending this extra time figuring out exactly what they plan to do all from the beginning of the study, and then filling out preregistration forms, what are they not doing instead?
If you don’t publicly pre-commit to what you’re going to measure then p-values become a bit meaningless since nobody can know if that was the only thing you measured.
If researchers are well organized then pre-reg should be almost free. On the other hand if they’re disorganized, winging it and making things up as they go along then pre-reg will look like a terrible burden since it forces them to decide what they’re actually going to do.
The short general version of my argument is: feedback > filtering
I would agree that preregistration is one way to make p-values more useful. They may be the best way to determine what the researcher originally intended to measure, but they’re not the only way to know if that was the only thing a researcher measured. I’ve found asking questions often works.
If we’re talking strictly about properly run RCTs, then I would agree, preregistration is close to free relatively speaking. But that’s because a properly conducted RCT is such a big undertaking that most filtering initiatives are going to be relatively small. But RCTs aren’t the only kind of study design out there. They are the gold-standard, yes, in that they have the greatest robustness, but their major drawback is that they’re expensive to conduct properly relative to alternatives.
Science already has a pretty strong filter. Researchers need to spend 8 years (and usually much more than that) after high school working towards a PhD. They then have to decide that what they’re doing is the best way to analyze a problem, or if they’re still in grad school, their professor has to approve it, and they have to believe in it. Then two or more other people with PhDs who weren’t involved in the research (editor and peer reviewer(s)) have to review what the researcher did, and come to the conclusion that the research was properly conducted. I don’t view this as principally a filtering problem. Filtering can improve quality, but it also reduces the number of possible ways to conduct research. The end result of excessive filtering to me is that everybody ends up just doing RCTs for everything, which is extremely cost-inefficient, and leads to the problem of everybody chasing funding. If nobody with less than a million on their credit can conduct a study, I think that’s a problem.
Having been through some of that process… it’s less than stellar.
That recent “creator” paper managed, somehow, to get through peer review and in the past I’ve been acutely aware that it’s been clear that sometimes reviewers have no clue about what they’ve been asked to review and just sort of wave it through with a few requests for spelling and grammar corrections.
To an extent it’s a very similar problem to ones faced in programming and engineering. Asking for more feedback is just the waterfall model applied to research.
To an extent, even if researchers weren’t being asked to publicly post their pre-reg getting them to actually work out what they’re planning to measure is a little like getting programmers to adopt Test Driven Development (write the tests, then write the code) which tends to produce higher quality output.
Despite that 8 years a lot of people still don’t really know what they’re doing in research and just sort of ape their supervisor. (who may have been in the same situation)
Since the system is still half-modeled on the old medieval master-journeyman-apprentice system you can also get massive massive massive variation in ability/competence so simply trusting in people being highly qualified isn’t very reliable.
The simplest way to illustrate the problem is to point to really really basic stats errors which make it into huge portions of the literature. Basic errors which have made it past supervisors, made it past reviewers, made it past editors. Made it past many people with PHD’s and not one picked up on them.
(This is just an example, there are many many other basic errors made constantly in research)
http://www.badscience.net/2011/10/what-if-academics-were-as-dumb-as-quacks-with-statistics/
It makes sense when you realize that many people simply ape their supervisors and the existing literature. When bad methods make it into a paper people copy those methods without ever considering whether they’re obviously incorrect.
I’m arguing we need more feedback rather than more filtering.
You’re arguing the new filtering will be more effective than the old filtering, and as proof, here is all the ways the old filtering method has failed.
But pointing out that filtering didn’t work in the past is not a criticism of my argument that we need more feedback such as through objective post-publication reviews of articles. I never argued that the old filtering method works.
If you believe the old filtering method isn’t a stringent filtering system, do you believe it wouldn’t make much difference if we removed it, and let anybody publish anywhere without peer review as long as they preregistered their study? Would this produce an improvement?
I think you also need to contend with the empirical evidence from COMPare that preregistration (the new filtering method you support) hasn’t been effective so far.
I think more stringent filtering can increase reliability, but doing so will also increase wastefulness. Feedback can increase reliability without increasing wastefulness.
Feedback from supervisors and feedback from reviewers is what the current system is mostly based on. We’re currently in a mostly-feedback system but it’s disorganised, poorly standardised feedback and the feedback tends to end a very short time after publication.
Some of the better journals operate blinded reviews so that in theory “anybody” should be able to publish a paper if the quality is good and that’s a good thing.
COMPare implies that preregistration didn’t solve all the problems but other studies have shown that it has massively improved matters.
If that’s true, why are replication rates so poor?
You can ask questions but how do you know whether the answers you are getting are right? It’s quite easy for people who fit a linear model to play a bit around with the parameters and not even remember all parameters they tested.
More often they don’t review what the researcher did but what the researchers claimed they did.
There is no feedback post publication. Researchers are expected to individually decide on the quality of a published study, or occasionally ask the colleagues in their department.
I don’t get the impression that low replication rates are due to malice generally. I think it’s a training and incentive problem most of the time. In that case just asking should often work.
Science has very little feedback and lots of filtering at present. Preregistration is just more filtering. Science needs more feedback.
What kind of feedback would you want to exist?
Spending time on using a lot of different statistical techniques till one of them provides statistical significant restuls?