Suppose you summarize the effect of a drug using a relative risk (a multiplicative effect parameter relating the probability of the event if treated with the probability of the event if untreated), and consider this multiplicative parameter to represent the “magnitude of the effect”
The natural thing for a clinician to do will be to assume that the magnitude of the effect is the same in their own patients. They will therefore rely on this specific scale for extrapolation from the study to their patients. However, those patients may have a different risk profile.
When clinicians do this, they will make different predictions depending on whether the relative risk is based on the probability of the event, or the probability of the complement of the event.
Sheps’ solution to this problem is the same as mine: If the intervention results in a decrease to the risk of the outcome, you should use the probability of the event to construct the relative risk, whereas if the intervention increases the risk of the event, you should use the probability of the complement of the event
My next question was whether this was “just pedagogy and communication to help people avoid dumb calculation mistakes” or a real and substantive issue, and then I watched the video...
And it is nice that the video is only 4:51 seconds long and works “just as well” on 2X…
And… I think basically the claim is NOT just pedagogical, but substantive, but it was hard to notice.
I’ve swirled your content around, and in doing so I feel like I’ve removed the stats confusion and turned it around so it sharpens the way the core question is about physical reality and modeling intuitions… Here is an alternative abstract that I offer for your use (as is, or with edits) if you want it <3
Imagine that an enormous high powered study samples people in general, and 1% of the control group has allergies on placebo, while 2% on “the real drug” have an allergic reaction. Then a specific patient from a subpopulation where 10% of the people have separately been measured to be predisposed to allergies, comes to a doctor who then tries to weigh treatment benefits vs complication risks. Assume the doctor is rushed, and can’t do a proper job (or perhaps lacks a relevant rapid test kit and/or lacks the ability to construct tests from first principles because of a brutally restrictive regulatory environment in medicine) and so can only go off of subpopulation data without measuring the patient for direct mechanistic pre-disposing allergy factors. What is the maximally structurally plausible probability of an allergic reaction, as a complication for that patient, in response to treatment: ~2% or ~11% or ~20%? This fact of the matter, at this clinical decision point, could itself be measured empirically. Textbooks that speak to this say ~20%, but those textbooks are wrong because they have no theory of external reality and are basically just cargo-culting bad practices that have been bad for over half a century. The simplest Pearlian causal model (see figure 1) should be preferred based on Occam’s razor and says ~11%. Mindel C. Sheps proposed the right answer using related intuitions in 1958 but she has been ignored (and sometimes made fun of) because our field is systematically bad at reasoning thoughtfully about physical reality. This paper aims to correct this defect in how the field reasons.
Does this seem like a friendly proposal (friendly to you, not friendly to the field, of course) for a better abstract, that focuses on a concrete example while pushing as hard as possible on the central substantive issue?
I admit: peer reviewers would probably object to this.
Also I did intentionally “punch it up” more than might be justified in hopes that you’ll object in an informative way here and now. My abstract is PROBABLY WRONG (one example: I know your figure 1 is not a pearlian model) but I hope it is wrong in the way a bold sketch is less accurate than a more painterly depiction, while still being useful to figure out what the central message can or should be.
Thank you so much for writing this! Yes, this is mostly an accurate summary of my views (although I would certainly phrase some things differently). I just want to point out two minor disagreements:
I don’t think the problem is that doctors are too rushed to do a proper job, I think the patient-specific data that you would need is in many cases theoretically unobservable, or at least that we would need a much more complete understanding of biological mechanisms in order to know what to test the patients for in order to make a truly individualized decision. At least for the foreseeable future, I think it will be impossible for doctors to determine which patients will benefit on an individual level, they will be constrained to using the patient’s observables to put them in a reference group, and then use that reference group to predict risk based on observations from other patients in the same reference group
I am not entirely convinced that the Pearlian approach is the most natural way to handle this. In the manuscript, I use “modern causal models” as a more general term that also includes other types of counterfactual causal models. Of course, all these models are basically isomorphic, and Cinelli/Pearl did show in response to my last paper that it is possible to do the same thing using DAGs. I am just not at all convinced that the easiest way to capture the relevant intuition is to use the Pearl’s graphical representation of the causal models.
Assume the doctor is rushed, and can’t do a proper job (or perhaps lacks a relevant rapid test kit and/or lacks the ability to construct tests from first principles because of a brutally restrictive regulatory environment in medicine) and so can only go off of subpopulation data without measuring the patient for direct mechanistic pre-disposing allergy factors
Even if the doctor could run all the tests they desire on the patient, the orginal study that said 1% of the control group and 2% on the real drug does not contain information about what pre-disposing allergy factors the patients in the trial had.
The official study is neither the beginning not the end of knowledge. If people were being really competent and thorough, the study could have have collected all kinds of additional patient metadata.
The patient’s body is made of atoms that move according to physical laws. It is basically a machine. With the correct mechanistic modeling (possibly very very complicated) grounded in various possible measurements (some simple, some maybe more complicated) all motions of the atoms of the body are potentially subject to scientific mastery and intentional control.
From patient to patient, there are commonalities. Places where things work the same. This allows shortcuts… transfer of predictions from one patient to another.
Since the body operates as it does for physical reasons, if a patient had a unique arrangement of atoms, that could produce a unique medical situation…
...and yet the unique medical situation will still obey the laws of physics and chemistry and biochemistry and so on. From such models, with lots of data, one could still hypothetically be very very confident even about how to treat a VERY unique organism.
Veterinarians tend to be better at first principles medicine than mere human doctors. There are fewer vet jobs, and fewer vet schools, and helping animals has more of a prestige halo among undergrads than helping humans, and the school applications are more competitive, and the domain itself is vastly larger, so more generic reasoning tends to be taught and learned and used.
If a single human doctor was smart and competent and thorough, they could have calibrated hunches about what tests the doctors who ran the “1% and 2% study” COULD have performed.
If a single doctor was smart and competent and thorough, they could look at the study that said “overall in humans in general in a large group: side effect X was 1% in controls and 2% with the real drug” AND they could sequence the entire genome of the patient and make predictions from this sequence data. The two kinds of data could potentially be reconciled and used together for the specific patient.
BUT, if a single doctor was smart and competent and thorough, they could ALSO (perhaps) have direct access to the list of all allergic reactions the patient is capable of generating because they directly sampled the antibodies in the patient, and now have a computerized report of that entire dataset and what it means.
Heck, with alphafold in the pipeline now, a hypothetical efficacy study could hypothetically have sequenced every STUDY patient, and every patient’s unique gene sequences and unique drug-target-folding could be predicted.
A study output might not be “effective or not” but rather just be a large computer model where the model can take any plausible human biodata package and say which reactions (good, bad, or interesting) the drug would have for the specific person with 99.9% confidence one way or the other.
Drugs aren’t magic essences. Their “non-100%” efficacy rates are not ontologically immutable facts. Our current “it might work, it might not” summaries of drug effects… are caused partly by our tolerance for ignorance, rather than only by the drug’s intrinsically random behavior.
We can model a drug as a magic fetish the patient puts in their mouth, and which sometimes works or somethings doesn’t, as a brute fact, characterized only in terms of central tendencies…
...but this modeling approach is based on our limits, which are not set in stone.
Science is not over. Our doctors are still basically like witch doctors compared to the theoretical limits imposed by the laws of physics.
The current barriers to good medical treatment are strongly related to how much time and effort it takes to talk to people and follow up and measure things… and thus they are related to wealth, and thus economics, and thus economic regulatory systems.
Our government and universities are bad, and so our medical regulations are bad, and so our medicine is bad. It is not against the laws of physics for medicine to be better than this.
Concretely: do you have a physical/scientific hunch here? It kinda seems like you’re advocating “2% because that’s what the study said”?
What is the maximally structurally plausible probability of an allergic reaction, as a complication for that patient, in response to treatment: ~2% or ~11% or ~20%?
The patient’s body is made of atoms that move according to physical laws.
Yes, but making treatment decisions based pathophysiological theories goes counter to what evidence-based medicine is about. The idea of this method is that it’s going to be used by doctors practicing evidence-based medicine.
You can argue that evidence-based medicine is a flawed paradigm and doctors should instead practice physical-law-based medicine (or whatever you want to call it) but that’s a more general discussion then the one about this particular heuristic.
This comment touches on the central tension between the current paradigm in medicine, i.e. “evidence-based medicine” and an alternative and intuitively appealing approach based on a biological understanding of mechanism of disease.
In evidence-based medicine, decisions are based on statistical analysis of randomized trials; what matters is whether we can be confident that the medication probabilistically has improved outcomes when tested on humans as a unit. We don’t care really care too much about the mechanism behind the causal effect, just whether we can be sure it is real.
The exaggerated strawman alternative approach to EBM would be Star Trek medicine, where the ship’s doctor can reliably scan an alien’s biology, determine which molecule is needed to correct the pathology, synthesize that molecule and administer it as treatment.
If we have a complete understanding of what Nancy Cartwright calls “the nomological machine”, Star Trek medicine should work in theory. However, you are going to need a very complete, accurate and detailed map of the human body to make it work. Given the complexity of the human body, I think we are very far from being able to do this in practice.
There have been many cases in recent history where doctors believed they understood biology well enough to predict the consequences, yet were proved wrong by randomized trials. See for example Vinay Prasad’s book “Ending Medical Reversal”.
My personal view is that we are very far from being able to ground clinical decisions in mechanistic knowledge instead of randomized trials. Trying to do so would probably be dangerous given the current state of biological understanding. However, we can probably improve on naive evidence-based medicine by carving out a role for mechanistic knowledge to complement data analysis. Mechanisms seems particularly important for reasoning correctly about extrapolation, the purpose of my research program is to clarify one way such mechanisms can be used. It doesn’t always work perfectly, but I am not aware of any examples where an alternative approach works better.
Suppose you summarize the effect of a drug using a relative risk (a multiplicative effect parameter relating the probability of the event if treated with the probability of the event if untreated), and consider this multiplicative parameter to represent the “magnitude of the effect”
The natural thing for a clinician to do will be to assume that the magnitude of the effect is the same in their own patients. They will therefore rely on this specific scale for extrapolation from the study to their patients. However, those patients may have a different risk profile.
When clinicians do this, they will make different predictions depending on whether the relative risk is based on the probability of the event, or the probability of the complement of the event.
Sheps’ solution to this problem is the same as mine: If the intervention results in a decrease to the risk of the outcome, you should use the probability of the event to construct the relative risk, whereas if the intervention increases the risk of the event, you should use the probability of the complement of the event
Thanks!
My next question was whether this was “just pedagogy and communication to help people avoid dumb calculation mistakes” or a real and substantive issue, and then I watched the video...
And it is nice that the video is only 4:51 seconds long and works “just as well” on 2X…
And… I think basically the claim is NOT just pedagogical, but substantive, but it was hard to notice.
I’ve swirled your content around, and in doing so I feel like I’ve removed the stats confusion and turned it around so it sharpens the way the core question is about physical reality and modeling intuitions… Here is an alternative abstract that I offer for your use (as is, or with edits) if you want it <3
Does this seem like a friendly proposal (friendly to you, not friendly to the field, of course) for a better abstract, that focuses on a concrete example while pushing as hard as possible on the central substantive issue?
I admit: peer reviewers would probably object to this.
Also I did intentionally “punch it up” more than might be justified in hopes that you’ll object in an informative way here and now. My abstract is PROBABLY WRONG (one example: I know your figure 1 is not a pearlian model) but I hope it is wrong in the way a bold sketch is less accurate than a more painterly depiction, while still being useful to figure out what the central message can or should be.
Thank you so much for writing this! Yes, this is mostly an accurate summary of my views (although I would certainly phrase some things differently). I just want to point out two minor disagreements:
I don’t think the problem is that doctors are too rushed to do a proper job, I think the patient-specific data that you would need is in many cases theoretically unobservable, or at least that we would need a much more complete understanding of biological mechanisms in order to know what to test the patients for in order to make a truly individualized decision. At least for the foreseeable future, I think it will be impossible for doctors to determine which patients will benefit on an individual level, they will be constrained to using the patient’s observables to put them in a reference group, and then use that reference group to predict risk based on observations from other patients in the same reference group
I am not entirely convinced that the Pearlian approach is the most natural way to handle this. In the manuscript, I use “modern causal models” as a more general term that also includes other types of counterfactual causal models. Of course, all these models are basically isomorphic, and Cinelli/Pearl did show in response to my last paper that it is possible to do the same thing using DAGs. I am just not at all convinced that the easiest way to capture the relevant intuition is to use the Pearl’s graphical representation of the causal models.
Even if the doctor could run all the tests they desire on the patient, the orginal study that said 1% of the control group and 2% on the real drug does not contain information about what pre-disposing allergy factors the patients in the trial had.
The official study is neither the beginning not the end of knowledge. If people were being really competent and thorough, the study could have have collected all kinds of additional patient metadata.
The patient’s body is made of atoms that move according to physical laws. It is basically a machine. With the correct mechanistic modeling (possibly very very complicated) grounded in various possible measurements (some simple, some maybe more complicated) all motions of the atoms of the body are potentially subject to scientific mastery and intentional control.
From patient to patient, there are commonalities. Places where things work the same. This allows shortcuts… transfer of predictions from one patient to another.
Since the body operates as it does for physical reasons, if a patient had a unique arrangement of atoms, that could produce a unique medical situation…
...and yet the unique medical situation will still obey the laws of physics and chemistry and biochemistry and so on. From such models, with lots of data, one could still hypothetically be very very confident even about how to treat a VERY unique organism.
Veterinarians tend to be better at first principles medicine than mere human doctors. There are fewer vet jobs, and fewer vet schools, and helping animals has more of a prestige halo among undergrads than helping humans, and the school applications are more competitive, and the domain itself is vastly larger, so more generic reasoning tends to be taught and learned and used.
If a single human doctor was smart and competent and thorough, they could have calibrated hunches about what tests the doctors who ran the “1% and 2% study” COULD have performed.
If a single doctor was smart and competent and thorough, they could look at the study that said “overall in humans in general in a large group: side effect X was 1% in controls and 2% with the real drug” AND they could sequence the entire genome of the patient and make predictions from this sequence data. The two kinds of data could potentially be reconciled and used together for the specific patient.
BUT, if a single doctor was smart and competent and thorough, they could ALSO (perhaps) have direct access to the list of all allergic reactions the patient is capable of generating because they directly sampled the antibodies in the patient, and now have a computerized report of that entire dataset and what it means.
Heck, with alphafold in the pipeline now, a hypothetical efficacy study could hypothetically have sequenced every STUDY patient, and every patient’s unique gene sequences and unique drug-target-folding could be predicted.
A study output might not be “effective or not” but rather just be a large computer model where the model can take any plausible human biodata package and say which reactions (good, bad, or interesting) the drug would have for the specific person with 99.9% confidence one way or the other.
Drugs aren’t magic essences. Their “non-100%” efficacy rates are not ontologically immutable facts. Our current “it might work, it might not” summaries of drug effects… are caused partly by our tolerance for ignorance, rather than only by the drug’s intrinsically random behavior.
We can model a drug as a magic fetish the patient puts in their mouth, and which sometimes works or somethings doesn’t, as a brute fact, characterized only in terms of central tendencies…
...but this modeling approach is based on our limits, which are not set in stone.
Science is not over. Our doctors are still basically like witch doctors compared to the theoretical limits imposed by the laws of physics.
The current barriers to good medical treatment are strongly related to how much time and effort it takes to talk to people and follow up and measure things… and thus they are related to wealth, and thus economics, and thus economic regulatory systems.
Our government and universities are bad, and so our medical regulations are bad, and so our medicine is bad. It is not against the laws of physics for medicine to be better than this.
Concretely: do you have a physical/scientific hunch here? It kinda seems like you’re advocating “2% because that’s what the study said”?
Yes, but making treatment decisions based pathophysiological theories goes counter to what evidence-based medicine is about. The idea of this method is that it’s going to be used by doctors practicing evidence-based medicine.
You can argue that evidence-based medicine is a flawed paradigm and doctors should instead practice physical-law-based medicine (or whatever you want to call it) but that’s a more general discussion then the one about this particular heuristic.
This comment touches on the central tension between the current paradigm in medicine, i.e. “evidence-based medicine” and an alternative and intuitively appealing approach based on a biological understanding of mechanism of disease.
In evidence-based medicine, decisions are based on statistical analysis of randomized trials; what matters is whether we can be confident that the medication probabilistically has improved outcomes when tested on humans as a unit. We don’t care really care too much about the mechanism behind the causal effect, just whether we can be sure it is real.
The exaggerated strawman alternative approach to EBM would be Star Trek medicine, where the ship’s doctor can reliably scan an alien’s biology, determine which molecule is needed to correct the pathology, synthesize that molecule and administer it as treatment.
If we have a complete understanding of what Nancy Cartwright calls “the nomological machine”, Star Trek medicine should work in theory. However, you are going to need a very complete, accurate and detailed map of the human body to make it work. Given the complexity of the human body, I think we are very far from being able to do this in practice.
There have been many cases in recent history where doctors believed they understood biology well enough to predict the consequences, yet were proved wrong by randomized trials. See for example Vinay Prasad’s book “Ending Medical Reversal”.
My personal view is that we are very far from being able to ground clinical decisions in mechanistic knowledge instead of randomized trials. Trying to do so would probably be dangerous given the current state of biological understanding. However, we can probably improve on naive evidence-based medicine by carving out a role for mechanistic knowledge to complement data analysis. Mechanisms seems particularly important for reasoning correctly about extrapolation, the purpose of my research program is to clarify one way such mechanisms can be used. It doesn’t always work perfectly, but I am not aware of any examples where an alternative approach works better.