Two results to p<0.05 out of a total of two relevant experiments is pretty good.
You are misreading the situation. You have two p<0.05 results out of two published studies and out of a total of no one knows how many relevant experiments.
Where is it written that no amount of bad evidence can add up to good evidence?
Thus spake the Book of Selection Bias. And the Book of Making Shit Up sagely nodded and said “Verily this is so”.
Well, fine, but in order to get two p<0.05 results by chance if there’s no effect, there’d have had to be roughly forty failed experiments done and never mentioned. Is that likely?
But the question’s easy to settle if that’s what we really think might have happened. Just pre-register and replicate one.
I really think that if there were enough interest in the question that it had prompted forty failures, someone would have had the wit to do that. Don’t you?
I mean, we’ve got an a priori plausible hypothesis, lots of evidence for, some of it solid, most of it weak, no evidence against. There would have to be one hell of a filter somewhere to justify ignoring that. Wouldn’t there?
Well, fine, but in order to get two p<0.05 results by chance if there’s no effect, there’d have had to be roughly forty failed experiments done and never mentioned. Is that likely?
First, that forty number comes from the spherical-cow land where everything is independent, normally distributed, gardens of forking paths do not exist, etc. etc. Look at the replication crisis in psychology which has been getting a lot of press recently. They have dozens of papers showing highly significant results for some effect which, as it turns out now, does not exist.
Second, consider the incentives. Say, you ran a small trial, got zero results, what are you going to do with it? No journal will be very excited about the “we tried a weird thing and it didn’t work” paper. Or, say, you got negative results, your patients started dying. Would you be terribly interested in writing up “we tried a weird thing and ended up killing some people” results?
Wouldn’t there?
You need to convince not me, but people who are good at writing grant proposals and passing ethics boards :-/
You need to convince not me, but people who are good at writing grant proposals and passing ethics boards :-/
Sure, but I have now managed to convince myself, so I am practising. Please be the most evil opponent you can be!
Second, consider the incentives.
Incentives-wise, doctors are regularly getting struck off for believing and practising this, all over the world. And the NICE guidelines specifically say that thyroxine is not to be used for the treatment of CFS. You’d think they’d be overjoyed to have a reference to quote when striking people off/writing guidelines.
Would you be terribly interested in writing up “we tried a weird thing and ended up killing some people” results?
Well, I think that’s exactly what the Scottish GPs thought they were writing up! They found ‘no difference from placebo’ in their patient group, and ‘harmful’ in their control group.
Where we differ is that I think that means that they must actually have done a fair bit of good in their patient group (specifically, in that portion of their patient group who actually had type 2 hypothyroidism, and for whom 100mg thyroxine/day was roughly the right amount. According to Skinner, that would have been too much for many of them, and too little for many others).
Two counts as “a couple of studies”.
That’s equivalent to “no good evidence”. You can’t make up for quality with quantity here.
Now we are in actual disagreement. Two results to p<0.05 out of a total of two relevant experiments is pretty good.
If there were a further four with ‘no effect’ I might be less impressed. But there aren’t, unless they’re in a file drawer somewhere.
Where is it written that no amount of bad evidence can add up to good evidence? Evidence is evidence, Bayes-wise.
I say that the sun rises every morning. There has been no PCRT. Am I likely mistaken?
You are misreading the situation. You have two p<0.05 results out of two published studies and out of a total of no one knows how many relevant experiments.
Thus spake the Book of Selection Bias. And the Book of Making Shit Up sagely nodded and said “Verily this is so”.
Well, fine, but in order to get two p<0.05 results by chance if there’s no effect, there’d have had to be roughly forty failed experiments done and never mentioned. Is that likely?
But the question’s easy to settle if that’s what we really think might have happened. Just pre-register and replicate one.
I really think that if there were enough interest in the question that it had prompted forty failures, someone would have had the wit to do that. Don’t you?
I mean, we’ve got an a priori plausible hypothesis, lots of evidence for, some of it solid, most of it weak, no evidence against. There would have to be one hell of a filter somewhere to justify ignoring that. Wouldn’t there?
First, that forty number comes from the spherical-cow land where everything is independent, normally distributed, gardens of forking paths do not exist, etc. etc. Look at the replication crisis in psychology which has been getting a lot of press recently. They have dozens of papers showing highly significant results for some effect which, as it turns out now, does not exist.
Second, consider the incentives. Say, you ran a small trial, got zero results, what are you going to do with it? No journal will be very excited about the “we tried a weird thing and it didn’t work” paper. Or, say, you got negative results, your patients started dying. Would you be terribly interested in writing up “we tried a weird thing and ended up killing some people” results?
You need to convince not me, but people who are good at writing grant proposals and passing ethics boards :-/
Sure, but I have now managed to convince myself, so I am practising. Please be the most evil opponent you can be!
Incentives-wise, doctors are regularly getting struck off for believing and practising this, all over the world. And the NICE guidelines specifically say that thyroxine is not to be used for the treatment of CFS. You’d think they’d be overjoyed to have a reference to quote when striking people off/writing guidelines.
Well, I think that’s exactly what the Scottish GPs thought they were writing up! They found ‘no difference from placebo’ in their patient group, and ‘harmful’ in their control group.
Where we differ is that I think that means that they must actually have done a fair bit of good in their patient group (specifically, in that portion of their patient group who actually had type 2 hypothyroidism, and for whom 100mg thyroxine/day was roughly the right amount. According to Skinner, that would have been too much for many of them, and too little for many others).