In general I think distinguishing ideas analogous to pre-vitamin-C scurvy from ideas like epicycles is one of the Hamming problems of science, and would cheer development in the area.
I’ll take a stab at this.
It seems like the key distinction is separating ideas that are heavily but still inadequately tested (epicycles) from those that are underexplored (pre-vitamin-C scurvy) and those that are thoroughly tested (the pathogen theory of disease).
Researchers are bound by convention, resources, and the limits of their own imagination and enthusiasm. As such, they often wind up using experimental methods that both fail to distinguish empirical confounders and can never yield a better underlying mechanistic account of the phenomena at hand. Instead, they just put out another study as meta-analysis fodder.
Take this sleep research. We can find dozens of studies showing that X hours of sleep deprivation results in Y amount of cognitive deficit. But have the scientists designed studies to test for weird confounds?
Maybe being sleepy doesn’t make you cognitively slow, but instead makes you socially anxious. Being around a bunch of strange scientists subjecting you to unusual tests in an unfamiliar environment could trigger anxiety. This in turn could explain some of the cognitive deficits in the sleep research. Sleepiness → social anxiety → cognitive deficits around strangers. That hypothesis took me all of 1 minute to come up with, and it might be true, who knows? If resources were unlimited for testing it, we could just go ahead and do it.
If it takes me 1 minute to come up with a probably-untested plausible alternative hypothesis to a field that has produced dozens and dozens of studies, it’s a sign that the field is designing its experiments inefficiently, and we should discount the apparent strength of its findings. This doesn’t mean that its current hypothesis is wrong, but that it has disguished alternative hypotheses by neglecting to test them and thus created a false impression of consensus.
If it’s not possible to test all the hypotheses we’d ideally like to explore, then too bad. Scientists don’t get to say they’ve discovered “the truth” just because they’ve hit the current limit of their experimental capacities. They just have to admit that they’re not sure. In this case, we would start by pointing out that we have extremely limited mechanistic information on exactly how sleep and extended wakefulness physically alter the brain, and how any such alterations turn into cognitive impacts. By contrast, we have abundant (if still far from completely sufficient) physical information about how other complex physiological systems work, such as the immune system.
So in general, we ought to rank our certainty not by how much evidence we’ve accumulated, but by how many alternative hypotheses we’ve ruled out (this idea is certainly not original to me). Unfortunately, either there’s no good way to quantify this, or there is, and it’s just not conventional for scientists to do so when designing their experiments and writing their literature reviews.
Based on my limited experience in the lab environment, I can already see firsthand the degree to which resource constraints, convention, careerism, and an honest assessment of one’s own strengths and limitations as a scientist dominate decisions about what to study and how to study it. “Important if true” gives way to other concerns, like “who will pay to find out?.”
I think science tends to get the biggest big things right, and that’s a profound accomplishment. But if it’s a question with significant nuance, in a niche subfield, and doesn’t have any really crisp methods and unequivocal metrics, then you’re at the frontier, and should retain durable skepticism about the field.
I’ll take a stab at this.
It seems like the key distinction is separating ideas that are heavily but still inadequately tested (epicycles) from those that are underexplored (pre-vitamin-C scurvy) and those that are thoroughly tested (the pathogen theory of disease).
Researchers are bound by convention, resources, and the limits of their own imagination and enthusiasm. As such, they often wind up using experimental methods that both fail to distinguish empirical confounders and can never yield a better underlying mechanistic account of the phenomena at hand. Instead, they just put out another study as meta-analysis fodder.
Take this sleep research. We can find dozens of studies showing that X hours of sleep deprivation results in Y amount of cognitive deficit. But have the scientists designed studies to test for weird confounds?
Maybe being sleepy doesn’t make you cognitively slow, but instead makes you socially anxious. Being around a bunch of strange scientists subjecting you to unusual tests in an unfamiliar environment could trigger anxiety. This in turn could explain some of the cognitive deficits in the sleep research. Sleepiness → social anxiety → cognitive deficits around strangers. That hypothesis took me all of 1 minute to come up with, and it might be true, who knows? If resources were unlimited for testing it, we could just go ahead and do it.
If it takes me 1 minute to come up with a probably-untested plausible alternative hypothesis to a field that has produced dozens and dozens of studies, it’s a sign that the field is designing its experiments inefficiently, and we should discount the apparent strength of its findings. This doesn’t mean that its current hypothesis is wrong, but that it has disguished alternative hypotheses by neglecting to test them and thus created a false impression of consensus.
If it’s not possible to test all the hypotheses we’d ideally like to explore, then too bad. Scientists don’t get to say they’ve discovered “the truth” just because they’ve hit the current limit of their experimental capacities. They just have to admit that they’re not sure. In this case, we would start by pointing out that we have extremely limited mechanistic information on exactly how sleep and extended wakefulness physically alter the brain, and how any such alterations turn into cognitive impacts. By contrast, we have abundant (if still far from completely sufficient) physical information about how other complex physiological systems work, such as the immune system.
So in general, we ought to rank our certainty not by how much evidence we’ve accumulated, but by how many alternative hypotheses we’ve ruled out (this idea is certainly not original to me). Unfortunately, either there’s no good way to quantify this, or there is, and it’s just not conventional for scientists to do so when designing their experiments and writing their literature reviews.
Based on my limited experience in the lab environment, I can already see firsthand the degree to which resource constraints, convention, careerism, and an honest assessment of one’s own strengths and limitations as a scientist dominate decisions about what to study and how to study it. “Important if true” gives way to other concerns, like “who will pay to find out?.”
I think science tends to get the biggest big things right, and that’s a profound accomplishment. But if it’s a question with significant nuance, in a niche subfield, and doesn’t have any really crisp methods and unequivocal metrics, then you’re at the frontier, and should retain durable skepticism about the field.