1. For health-related research, one of the main failure modes I’ve observed when people I know try to do this, is tunnel vision and a lack of priors about what’s common and relevant. Reading raw research papers before you’ve read broad-overview stuff will make this worse, so read UpToDate first and Wikipedia second. If you must read raw research papers, find them with PubMed, but do this only rarely and only with a specific question in mind.
2. Before looking at the study itself, check how you got there. If you arrived via a search engine query that asked a question or posed a topic without presupposing an answer, that’s good; if there are multiple studies that say different things, you’ve sampled one of them at random. If you arrived via a query that asked for confirmation of a hypothesis, that’s bad; if there are multiple studies that said different things, you’ve sampled in a way that was biased towards that hypothesis. If you arrived via a news article, that’s the worst; if there are multiple studies that said different things, you sampled in a way that was biased opposite reality.
3. Don’t bother with studies in rodents, animals smaller than rodents, cell cultures, or undergraduate psychology students. These studies are done in great numbers because they are cheap, but they have low average quality. The fact that they are so numerous makes the search-sampling problems in (2) more severe.
4. Think about what a sensible endpoint or metric would be before you look at what endpoint/metric was reported. If the reported metric is not the metric you expected, this will often be because the relevant metric was terrible. Classic examples are papers about battery technologies reporting power rather than capacity, biomedical papers reporting effects on biomarkers rather than symptoms or mortality.
5. Correctly controlling for confounders is much, much harder than people typically give it credit for. Adding extra things to the list of things controlled for can create spurious correlations, and study authors are not incentivized to handle this correctly. The practical upshot is that observational studies only count if the effect size is very large.
1. For health-related research, one of the main failure modes I’ve observed when people I know try to do this, is tunnel vision and a lack of priors about what’s common and relevant. Reading raw research papers before you’ve read broad-overview stuff will make this worse, so read UpToDate first and Wikipedia second. If you must read raw research papers, find them with PubMed, but do this only rarely and only with a specific question in mind.
2. Before looking at the study itself, check how you got there. If you arrived via a search engine query that asked a question or posed a topic without presupposing an answer, that’s good; if there are multiple studies that say different things, you’ve sampled one of them at random. If you arrived via a query that asked for confirmation of a hypothesis, that’s bad; if there are multiple studies that said different things, you’ve sampled in a way that was biased towards that hypothesis. If you arrived via a news article, that’s the worst; if there are multiple studies that said different things, you sampled in a way that was biased opposite reality.
3. Don’t bother with studies in rodents, animals smaller than rodents, cell cultures, or undergraduate psychology students. These studies are done in great numbers because they are cheap, but they have low average quality. The fact that they are so numerous makes the search-sampling problems in (2) more severe.
4. Think about what a sensible endpoint or metric would be before you look at what endpoint/metric was reported. If the reported metric is not the metric you expected, this will often be because the relevant metric was terrible. Classic examples are papers about battery technologies reporting power rather than capacity, biomedical papers reporting effects on biomarkers rather than symptoms or mortality.
5. Correctly controlling for confounders is much, much harder than people typically give it credit for. Adding extra things to the list of things controlled for can create spurious correlations, and study authors are not incentivized to handle this correctly. The practical upshot is that observational studies only count if the effect size is very large.