There were also a number of risk factors where the treatment and control groups had significant differences, most notably diabetes (present in 2.5x as many patients in the control group).
Firstly, two risk factors were more common among the treatment groups: <60 years of age, immunosuppressed & transplanted. Secondly, 3 treatment group patients (6%) were diabetic and 5 control (19.23%) were. Let us take the most generous assumptions for your position, and say that the 3 patients with diabetes in the treatment group did not require ICU and that the 5 in control group all required ICU. This is a strong assumption (aka unlikely).
With these generous assumptions, the study results are now that 1⁄47 patients in treatment required ICU and that 8⁄21 in control required. The p value remains .0001. *In order to achieve a p<.05 the lack of blinding/fuzziness would must have failed to send 16 of the 46 treatment group members to the ICU.* That is still not likely without deliberate fraud.
I think it is a really bad idea to disincentivise medium effort comments that point out problems with important studies. I don’t want it to be a requirement that you know statistics before you get to question a study.
I apologize; I made an error in my original comment. I was actually referring to high blood pressure rather than diabetes. 15 out of the 26 people in the control group had high blood pressure, which is greater than the number of people who needed ICU care. Using your (maximally generous) assumptions, we would have zero non-hypertensive patients from either group needing ICU care.
Firstly, two risk factors were more common among the treatment groups: <60 years of age, immunosuppressed & transplanted.
Absolutely true, but the overall risk factor prevalence was still significantly higher in the control group. Furthermore, I’m not sure if all risk factors are created equal. Regardless, the overall point is that the two groups had significant differences in important characteristics.
*In order to achieve a p<.05 the lack of blinding/fuzziness would must have failed to send 16 of the 46 treatment group members to the ICU.* That is still not likely without deliberate fraud.
I think it’s more likely that they sent a few of the control group members to the ICU unnecessarily. If you figure that the difference in risk factors between the two groups accounts for a couple of the extra ICU cases, the placebo effect accounts for another couple, and unnecessary ICU admission accounts for another couple, it brings the P-value up pretty dramatically. I’m not statistically literate enough to know how to properly adjust for those factors and get an exact number, but it doesn’t seem to require deliberate fraud.
Just to be clear, I still think that there is probably at least some sort of real effect here. I’m just advocating caution in interpreting the results of a tiny study with clear flaws. I don’t really understand why there was no placebo control or double blinding, and that makes me more suspicious that there are other flaws that I’m not educated enough to notice. For example, the way that they describe the ICU admission criteria suggests that the presence of a comorbidity is itself a factor for ICU admission. If that’s the case, the differences in the risk factor numbers become even more important.
https://twitter.com/AlvaroDeMenard/status/1304399452816519169/photo/1 - probability of reproduction from one forecaster for DARPA SCORE study. The distribution is bimodal because the lower hump is p=.05 and the upper hump is some lower p-value like p=.01 or something. It looks like even the higher p values have reproduction rates at .8 to .9 . This updates toward Shem’s skepticism. Even though the p value is very small, reproduction rate is still stuck at .85 for good studies. Since this study has the problem’s shem pointed out, we might expect a reproduction probability lower at like .75. So a likelihood ratio of 3:1.
I could imagine people with knowledge of the subject having sufficiently lower priors.
Firstly, two risk factors were more common among the treatment groups: <60 years of age, immunosuppressed & transplanted. Secondly, 3 treatment group patients (6%) were diabetic and 5 control (19.23%) were. Let us take the most generous assumptions for your position, and say that the 3 patients with diabetes in the treatment group did not require ICU and that the 5 in control group all required ICU. This is a strong assumption (aka unlikely).
With these generous assumptions, the study results are now that 1⁄47 patients in treatment required ICU and that 8⁄21 in control required. The p value remains .0001. *In order to achieve a p<.05 the lack of blinding/fuzziness would must have failed to send 16 of the 46 treatment group members to the ICU.* That is still not likely without deliberate fraud.
I think it is a really bad idea to disincentivise medium effort comments that point out problems with important studies. I don’t want it to be a requirement that you know statistics before you get to question a study.
You’re right, thanks for that. I removed that sentence and changed the tone a bit.
I apologize; I made an error in my original comment. I was actually referring to high blood pressure rather than diabetes. 15 out of the 26 people in the control group had high blood pressure, which is greater than the number of people who needed ICU care. Using your (maximally generous) assumptions, we would have zero non-hypertensive patients from either group needing ICU care.
Absolutely true, but the overall risk factor prevalence was still significantly higher in the control group. Furthermore, I’m not sure if all risk factors are created equal. Regardless, the overall point is that the two groups had significant differences in important characteristics.
I think it’s more likely that they sent a few of the control group members to the ICU unnecessarily. If you figure that the difference in risk factors between the two groups accounts for a couple of the extra ICU cases, the placebo effect accounts for another couple, and unnecessary ICU admission accounts for another couple, it brings the P-value up pretty dramatically. I’m not statistically literate enough to know how to properly adjust for those factors and get an exact number, but it doesn’t seem to require deliberate fraud.
Just to be clear, I still think that there is probably at least some sort of real effect here. I’m just advocating caution in interpreting the results of a tiny study with clear flaws. I don’t really understand why there was no placebo control or double blinding, and that makes me more suspicious that there are other flaws that I’m not educated enough to notice. For example, the way that they describe the ICU admission criteria suggests that the presence of a comorbidity is itself a factor for ICU admission. If that’s the case, the differences in the risk factor numbers become even more important.
https://twitter.com/AlvaroDeMenard/status/1304399452816519169/photo/1 - probability of reproduction from one forecaster for DARPA SCORE study. The distribution is bimodal because the lower hump is p=.05 and the upper hump is some lower p-value like p=.01 or something. It looks like even the higher p values have reproduction rates at .8 to .9 . This updates toward Shem’s skepticism. Even though the p value is very small, reproduction rate is still stuck at .85 for good studies. Since this study has the problem’s shem pointed out, we might expect a reproduction probability lower at like .75. So a likelihood ratio of 3:1.
I could imagine people with knowledge of the subject having sufficiently lower priors.