I’m curious about how others here process study results, specifically in psychology and the social sciences.
The (p < 0.05) threshold for statistical significance is, of course, completely arbitrary. So when I get to the end of a paper and the result that came in at, for example, (p < 0.1) is described as “a non-significant trend favoring A over B,” part of me wants to just go a head and update just a little bit, treating it as weak evidence, but I obviously don’t want to do even that if there isn’t a real effect and the evidence is unreliable.
I’ve found that study authors are often inconsistent with this—they’ll “follow the rules” and report no “main effect” detected when walking you through the results, but turn around and argue for the presence of a real effect in the discussion/analysis based on non-individually-significant trends in the data.
The question of how to update is further compounded by (1) the general irreproducibility of these kinds of studies, which may indicate the need to apply some kind of global discount factor to the weight of any such study, and (2) the general difficulty of properly making micro-adjustments to belief models as a human.
This is exactly the situation where heuristics are useful, but I don’t have a good one. What heuristics do you all use for interpreting results of studies in the social sciences? Do you have a cutoff p-value (or a method of generating one for a situation) above which you just ignore a result outright? Do you have some other way of updating your beliefs about the subject matter? If so, what is it?
I don’t spend enough of my time reading the results of studies that you should necessarily pay much attention to what I think. But: you want to know what information it gives you that the study found (say) a trend with p=0.1, given that the authors may have been looking for such a trend and (deliberately or not) data-mining/p-hacking and that publication filters out most studies that don’t find interesting results.
So here’s a crude heuristic:
There’s assorted evidence suggesting that (in softish-science fields like psychology) somewhere on the order of half of published results hold up on closer inspection. Maybe it’s really 25%, maybe 75%, but that’s the order of magnitude.
How likely is a typical study result ahead of time? Maybe p=1/4 might be typical.
In that case, getting a result significant at p=0.05 should be giving you about 4.5 bits of evidence but is actually giving you more like 1 bit.
So just discount every result you see in such a study by 3 bits or so. Crudely, multiply all the p-values by 10.
You might (might!) want to apply a bit less discounting in cases where the result doesn’t seem like one the researchers would have been expecting or wanting, and/or doesn’t substantially enhance the publishability of the paper, because such results are less likely to be produced by the usual biases. E.g., if that p=0.1 trend is an incidental thing they happen to have found while looking at something else, you maybe don’t need to treat it as zero evidence.
This is likely to leave you with lots of little updates. How do you handle that given your limited human brain? What I do is to file them away as “there’s some reason to suspect that X might be true” and otherwise ignore it until other evidence comes along. At some point there may be enough evidence that it’s worth looking properly, so then go back and find the individual bits of evidence and make an explicit attempt to combine them. Until then, you don’t have enough evidence to affect your behaviour much so you should try to ignore it. (In practice it will probably have some influence, and that’s probably OK. Unless it’s making you treat other people badly, in which case I suggest that the benefits of niceness probably outweigh those of correctness until the evidence gets really quite strong.)
Thank you! This is exactly what I was looking for. Thinking in terms of bits of information is still not quite intuitive to me, but it seems the right way to go. I’ve been away from LW for quite a while and I forgot how nice it is to get answers like this to questions.
I’m curious about how others here process study results, specifically in psychology and the social sciences.
The (p < 0.05) threshold for statistical significance is, of course, completely arbitrary. So when I get to the end of a paper and the result that came in at, for example, (p < 0.1) is described as “a non-significant trend favoring A over B,” part of me wants to just go a head and update just a little bit, treating it as weak evidence, but I obviously don’t want to do even that if there isn’t a real effect and the evidence is unreliable.
I’ve found that study authors are often inconsistent with this—they’ll “follow the rules” and report no “main effect” detected when walking you through the results, but turn around and argue for the presence of a real effect in the discussion/analysis based on non-individually-significant trends in the data.
The question of how to update is further compounded by (1) the general irreproducibility of these kinds of studies, which may indicate the need to apply some kind of global discount factor to the weight of any such study, and (2) the general difficulty of properly making micro-adjustments to belief models as a human.
This is exactly the situation where heuristics are useful, but I don’t have a good one. What heuristics do you all use for interpreting results of studies in the social sciences? Do you have a cutoff p-value (or a method of generating one for a situation) above which you just ignore a result outright? Do you have some other way of updating your beliefs about the subject matter? If so, what is it?
I don’t spend enough of my time reading the results of studies that you should necessarily pay much attention to what I think. But: you want to know what information it gives you that the study found (say) a trend with p=0.1, given that the authors may have been looking for such a trend and (deliberately or not) data-mining/p-hacking and that publication filters out most studies that don’t find interesting results.
So here’s a crude heuristic:
There’s assorted evidence suggesting that (in softish-science fields like psychology) somewhere on the order of half of published results hold up on closer inspection. Maybe it’s really 25%, maybe 75%, but that’s the order of magnitude.
How likely is a typical study result ahead of time? Maybe p=1/4 might be typical.
In that case, getting a result significant at p=0.05 should be giving you about 4.5 bits of evidence but is actually giving you more like 1 bit.
So just discount every result you see in such a study by 3 bits or so. Crudely, multiply all the p-values by 10.
You might (might!) want to apply a bit less discounting in cases where the result doesn’t seem like one the researchers would have been expecting or wanting, and/or doesn’t substantially enhance the publishability of the paper, because such results are less likely to be produced by the usual biases. E.g., if that p=0.1 trend is an incidental thing they happen to have found while looking at something else, you maybe don’t need to treat it as zero evidence.
This is likely to leave you with lots of little updates. How do you handle that given your limited human brain? What I do is to file them away as “there’s some reason to suspect that X might be true” and otherwise ignore it until other evidence comes along. At some point there may be enough evidence that it’s worth looking properly, so then go back and find the individual bits of evidence and make an explicit attempt to combine them. Until then, you don’t have enough evidence to affect your behaviour much so you should try to ignore it. (In practice it will probably have some influence, and that’s probably OK. Unless it’s making you treat other people badly, in which case I suggest that the benefits of niceness probably outweigh those of correctness until the evidence gets really quite strong.)
Thank you! This is exactly what I was looking for. Thinking in terms of bits of information is still not quite intuitive to me, but it seems the right way to go. I’ve been away from LW for quite a while and I forgot how nice it is to get answers like this to questions.