Sample size is related to how big an effect size you should be surprised by ie power. Big effect sizes in smaller populations = less surprising. Why is there no overall rule of thumb? Because it gets modified a bunch by the base rate of what you’re looking at and some other stuff I’m not remembering off the top of my head.
In general I’d say there’s enough methodological diversity that there’s a lot of stuff I’m looking for as flags that a study wasn’t designed well. For examples of such you can look at the inclusion criteria for meta-analyses.
There’s also more qualitative things about how much I’m extrapolating based on the discussion section by the study authors. In the longevity posts for example, I laud a study for having a discussion section where the authors explicitly spend a great deal of time talking about what sorts of things are *not* reasonable to conclude from the study even though they might be suggestive for further research directions.
Confounds are kinda like building a key word map. I’m looking at the most well regarded studies in a domain, noting down what they’re controlling for, then discounting studies that aren’t controlling for them to varying degrees. This is another place where qualitative judgements creep in even in cochrane reviews where they are forced to just develop ad hoc ‘tiers’ of evidence (like A, B, C etc) and give some guidelines for doing so.
I have higher skepticism in general than I did years ago as I have learned about the number of ways that effects can sneak into the data despite honest intention by moderately competent scientists. I’m also much more aware of a fundamental problem with selection effects in that anyone running a study has some vested interest in framing hypotheses in various ways because nobody devotes themselves to something about which they’re completely disinterested. This shows up as a problem in your own evaluation in that it’s almost impossible to not sneak in isolated demands for rigor based on priors.
I’m also generally reading over the shoulder of whichever other study reviewers seem to be doing a good job in a domain. Epistemics is a team sport. An example of this is when Scott did a roundup of evidence for low carb diets and mentioning lots of other people doing meta reviews and some speculating about why different conclusions were reached eg Luke Muelhauser and I came down on the side that the VLC evidence seemed weak and Will Eden came down on the side that it seemed more robust, seemingly differing on how much weight we placed on inside view metabolic models vs outside view long term studies.
That’s a hot take. It can be hard to just dump top level heuristics vs seeing what comes up from more specific questions/discussion.
Sample size is related to how big an effect size you should be surprised by ie power. Big effect sizes in smaller populations = less surprising. Why is there no overall rule of thumb? Because it gets modified a bunch by the base rate of what you’re looking at and some other stuff I’m not remembering off the top of my head.
In general I’d say there’s enough methodological diversity that there’s a lot of stuff I’m looking for as flags that a study wasn’t designed well. For examples of such you can look at the inclusion criteria for meta-analyses.
There’s also more qualitative things about how much I’m extrapolating based on the discussion section by the study authors. In the longevity posts for example, I laud a study for having a discussion section where the authors explicitly spend a great deal of time talking about what sorts of things are *not* reasonable to conclude from the study even though they might be suggestive for further research directions.
Confounds are kinda like building a key word map. I’m looking at the most well regarded studies in a domain, noting down what they’re controlling for, then discounting studies that aren’t controlling for them to varying degrees. This is another place where qualitative judgements creep in even in cochrane reviews where they are forced to just develop ad hoc ‘tiers’ of evidence (like A, B, C etc) and give some guidelines for doing so.
I have higher skepticism in general than I did years ago as I have learned about the number of ways that effects can sneak into the data despite honest intention by moderately competent scientists. I’m also much more aware of a fundamental problem with selection effects in that anyone running a study has some vested interest in framing hypotheses in various ways because nobody devotes themselves to something about which they’re completely disinterested. This shows up as a problem in your own evaluation in that it’s almost impossible to not sneak in isolated demands for rigor based on priors.
I’m also generally reading over the shoulder of whichever other study reviewers seem to be doing a good job in a domain. Epistemics is a team sport. An example of this is when Scott did a roundup of evidence for low carb diets and mentioning lots of other people doing meta reviews and some speculating about why different conclusions were reached eg Luke Muelhauser and I came down on the side that the VLC evidence seemed weak and Will Eden came down on the side that it seemed more robust, seemingly differing on how much weight we placed on inside view metabolic models vs outside view long term studies.
That’s a hot take. It can be hard to just dump top level heuristics vs seeing what comes up from more specific questions/discussion.