There’s a billion different ways studies can go wrong through failure of statistics to correctly capture reality.

One very specific one is: your successful treatments will often look like they’re making things worse.

This is merely a special case of confounding, but it comes up often enough I want to highlight it. And it seems specifically important for the discussion of masks and respirators.

For example (not a doctor): every new leukemia drug has a survival rate of like 30%. Base rate survival is like 60%. Does every new leukemia drug work worse than normal? Obviously not, it’s that the patients trying the new drug are the ones for whom everything else has failed, so they’ve been signed up for a clinical trial. These patients have very resistant cancers. And we can verify this, because when the drug does get adopted then survival rates go up.

Doctors have mostly figured this out with this subset of leukemia drugs. They do adjustments for how resistant the crazy-resistant cancers in the trial are, the adjustments aren’t good enough and show the drug has no effect or is bad, they throw the adjustments out and do something else (like compare the results of this new drug to some other new drug that’s almost like a control group), and they eventually seem to come to decent conclusions about whether to use the new drug in a setting that’s more conducive to sane trial results.

However, in a lot of cases, they don’t seem to figure this out, and it results in truly terrible treatment policies. The clearest example I’ve seen is in a few localized forms of cancer (like various sarcomas), where amputation shows vastly worse survival than local resection. As crazy as this seemed, it was minutely possible that side effects of amputation were in fact so bad that this was true, but after looking into it extensively I am very sure that in most extremity amputations it is not true. Doctors who see more aggressive tumors advocate amputation, these more aggressive tumors have worse overall survival regardless of how you treat, and amputation winds up looking statistically worse than local resection. And now the researchers who looked at this are advocating for fewer amputations, which will cost lives.

I’ve seen this in other cases too. The most recent: in patients with COPD, taking inhaled corticosteroids (ICS) is associated with higher pneumonia rates by a factor of 1.5 or something. This makes sense mechanistically, because corticosteroids do reduce immune function. But pneumonia rates are ~8x higher in people with COPD! One of the first hypotheses that should jump out here is that people with mild COPD take ICS less than those with severe COPD, and ICS helps those with severe COPD have pneumonia less but not enough to make up the full difference! I don’t know, maybe I’m the one missing something and people with severe COPD always use bronchodilators or something other than more ICS. But you at least have to address this in the study! And my hypothesis is supported by the fact that in asthma patients, ICS reduces pneumonia by a factor of ~2.

So, the pattern is that disease A has treatment X. Those with worse disease A have worse outcomes by, say, a factor of 3. Treatment X improves outcomes by a factor of 2 and is given preferentially or in higher doses to those with worse disease A because of cost or side effects of unclear size. An observational study comes along and sees that treatment X is associated with 1.5x worse outcomes than not-X! Or maybe just that outcomes are about the same. They denounce X as showing “no appreciable benefit” or “signs of harm”.

You won’t always see this pattern. With randomization, you won’t see it. If the treatment doesn’t change in dose or type with increasing disease burden, you won’t see it. Or if the beneficial effect of the treatment significantly outpaces the harmful effects of increased disease. The pattern is worst in cases where an observational study looks at a disease with quickly-scaling outcomes and a treatment that is partial or underpowered (cf cancer and COVID).

There’s some old joke about how taking supplement X is associated with low levels of X. It’s the same thing, just slightly more insidious.

Anyways, I’m concerned this pattern is showing up with the COVID discussions of masks and respirators. Both are things you’d common-sensically assume were helpful. Both show some weird nebulous signs of making things worse. (Certainly ventilators cause very serious side effects, but recently I saw some more serious claims about not boosting survival at all and thus being obviously net-costly. But of course, common sense says they have to be boosting survival some!) But if this is the news you’d expect to hear even in the world where there was no harm being done, the fact that you hear it is not much evidence that there’s in fact harm being done.

I feel irresponsible for posting this without doing too much investigation of the masks and ventilators, because plausibly this is pointing in the wrong policy direction on them, but I don’t have time for that at the moment. But in the meantime, I would like if every excited contrarian buzz about a treatment showing counterintuitive harm could be accompanied by SOME statement addressing the fact that you’d expect to see those results regardless of harm.

ETA: Some evidence SARS-CoV-2 attacks hemoglobin, which could inhibit gas exchange enough to cause oxygen poisoning in the lungs if oxygen concentrators or ventilators are used. On the other hand, a response on chemrxiv says the paper is terrible, and I haven’t looked into it. None of this changes my feelings toward how people should be reacting to or talking about the observation that tons of people on ventilators die.

Treatments correlated with harm

Connor_Flexman16 Apr 2020 21:02 UTC

40 points

2 comments3 min readLW link

World Modeling

What links here?

Vaniver 17 Apr 2020 19:30 UTC
4 points
But of course, common sense says they have to be boosting survival some!
But ‘common sense’ is just one particular causal model, and not obviously the correct one. If you think the problem is “the lungs are filling up with fluid, and so you can’t get oxygen to the blood” then ventilators seem obviously effective (tho a drainage solution would be even better, and I recklessly speculate this is one of the reasons people are trying proning); if you think the problem is “hemoglobin is being denatured such that the blood no longer effectively transmits oxygen” then the story for ventilators helping falls apart.
But in the meantime, I would like if every excited contrarian buzz about a treatment showing counterintuitive harm could be accompanied by SOME statement addressing the fact that you’d expect to see those results regardless of harm.
Surely the actual solution here is superior statistical techniques, right? Like I thought this was one of the primary value-adds of the causal modeling paradigm, was being able to more accurately estimate these sorts of effects in the presence of confounders.
Like, there must have been some threshold of survival for the leukemia drugs at which point it’s not worth switching to them, and presumably we can quantitatively calculate that threshold, and doing so is better than having a qualitative sense of that there’s an effect to be corrected for here.
- Connor_Flexman 17 Apr 2020 21:01 UTC
  2 points
  Yeah, I didn’t mean to imply that causal modeling wasn’t the obvious solution—you’re right about the existence of the leukemia threshold. But I guess in my experience of these mistakes, I often see people taking the action “try to do superior statistical techniques” and that not working for them (including rationalists and not just terrible science reporting sites), whereas I think “identify the places where your model is terrible and call that out” is a better first step for knowing how to build the superior models.
  In the ventilator case, for example, I’m not trying to advocate blindly following common sense, but I do think it’s important to incorporate common sense heavily. If people said, “There’s no evidence for respirators working, maybe hemoglobin is being denatured”, I certainly wouldn’t advocate for more common sense. But instead I tend to see “the statistics show respirators aren’t working, maybe we shouldn’t use them”, which seems to imply that common sense isn’t being given a say at all. It seems to me like always having common sense as one of your causal models is both an easy sell and a vital piece of the machine making sure your statistical techniques don’t go off the rails at any of their many opportunities.