Prediction-based medicine (PBM)
We need a new paradigm for doing medicine. I make the case by first speaking about the problems of our current paradigm of evidence-based medicine.
The status quo of evidence-based medicine
While biology moves forward and the cost of genetic-sequencing dropped a lot faster than Moore’s law the opposite is true for the development of new drugs. In the current status quo the development of new drugs rises exponentially with Eroom’s law. While average lifespan increased greatly about the last century in Canada the average life span at age 90 increased only 1.9 years over the last century. In 2008 the Centers for Disease Control and Prevention reported that life expectancy in the US declined from 77.9 to 77.8 years. After Worldbank data Germany increased average lifespan by two years over the last decade which is not enough for the dream of radical lifespan increases in our lifetime.
When it costs 80 million to test whether an intervention works and most attempts show that the intervention doesn’t work we have a problem. We end up paying billions for every new intervention.
Eric Ries wrote “The Lean Startup”. In it he argues that it’s the job of a startup to produce validated learning. He proposes that companies that work with small batch sizes can produce more innovation because they can learn faster how to build good products. The existing process in medicine doesn’t allow for small batch innovation because the measuring stick for whether an intervention works is too expensive.
In addition the evidence-based approach rests on the assumption that we don’t build bespoke interventions for every client. As long as a treatment doesn’t generalize about multiple different patients, it’s not possible to test it with a trial. In principle a double-blind trial can’t give you evidence that a bespoke intervention that targets the specific DNA profile of a patient and his co-morbidity works.
The ideal of prediction-based medicine
The evidence-based approach also assumes that practitioners are exchangeable. It doesn’t model the fact that different physical therapist or psychologists have different skill levels. It doesn’t provide a mechanism to reward highly skilled practitioners but it treats every practitioner that uses the same treatment intervention the same way.
Its strong focus on asking whether a treatment beats a placebo in double-blind studies makes it hard to compare different treatments against each other. In the absence of an ability to predict the effect sizes of different drugs with the literature the treatment that wins on the market is often the treatment that’s best promoted by a pharmaceutical company.
How could a different system work? What’s the alternative to making treatment decisions based on big and expensive studies that provide evidence?
I propose that a treatment provider should provide a patient with the credence that the treatment provider estimates for treatment outcomes that are of interest to the client.
If Bob wants to stop smoking and asks doctor Alice whether the treatment Alice provides will result in Bob not smoking in a year, Alice should provide him with her credence estimation. In addition Alice’s credence estimations can be entered in a central database. This allows Bob to see Alice’s Brier score that reflects the ability of Alice to predict the effects of her treatment recommendations.
In this framework Alice’s expertise isn’t backed up by having gotten an academic degree and recommending interventions that are studied with expensive gold-standard studies. Her expertise is backed by her track record.
This means that Alice can charge money based on the quality of her skills. If Alice is extremely good she can make a lot of money with her intervention without having to pay billions for running trials.
Why don’t we pay doctors in the present system based on their skills? We can’t measure their skills in the present paradigm, because we can’t easily compare the outcomes of different doctors. Hard patients get send to doctors with good reputations and as a result every doctor has an excuse for getting bad outcomes. In the status quo he can just assert that his patients were hard.
In prediction-based medicine a doctor can write down a higher credence for a positive treatment outcome for an easy patient than a hard patient. Patients can ask multiple doctors and are given good data to choose the treatment that provides the best outcome for which they are willing to pay.
In addition to giving the patient a more informed choice about the advantages of different treatment options this process helps the treatment provider to increase his skills. They learn about where they make errors in the estimation of treatment outcomes.
The provider can also innovate new treatments in small batches. Whenever he understands a treatment well enough to make predictions about its outcomes he’s in business. He can easily iterate on his treatment and improve it.
The way to bring prediction-based medicine into reality
I don’t propose to get rid of evidence-based medicine. It has its place and I don’t have any problem with it for the cases where it works well.
It works quite poorly for body work interventions and psychological interventions that are highly skill based. I have seen hypnosis achieve great effects but at the same time there are also many hypnotists who don’t achieve great effects. In the status quo a patient who seeks hypnosis treatment has no effective way to judge the quality of the treatment before he’s buying.
A minimal viable product might be a website that’s Uber for body workers and hypnotists. The website lists the treatment providers. The patient can enter his issue and every treatment provider can offer his credence of solving the issue of the patient and the price of his treatment.
Before getting shown the treatment providers, a prospective patient would take a standardized test to diagnose the illness. The information from the standardized test will allow the treatment providers make better predictions about the likelihood that they can cure the patient. Other standardized tests that aren’t disease specific like the OCEAN personality index can also be provided to the patient.
Following the ideas of David Burn’s TEAM framework, the treatment provider can also tell the patient to take tests between treatments sessions to keep better track of the progression of the patient.
When making the purchasing decision the patient agrees to a contract that includes him paying a fine, if he doesn’t report the treatment outcome after 3 months, 6 months and 1 year. This produces a comprehensive database of claims that allows us to measure how well the treatment providers are calibrated.
Various Quantified Self gadgets can be used to gather data. Many countries have centralized electronic health records that could be linked to a user account.
The startup has a clear business model. It can take a cut of every transaction. It has strong network effects and it’s harder for a treatment provider to switch because all his prediction track record is hosted on the website.
Thanks to various people from the Berlin Lesswrong crowd who gave valuable feedback for the draft of this article.
- What are some real life Inadequate Equilibria? by 29 Jan 2021 12:17 UTC; 50 points) (
- What are some real life Inadequate Equilibria? by 29 Jan 2021 12:17 UTC; 50 points) (
- What Would Advanced Social Technology Look Like? by 10 Nov 2020 17:55 UTC; 43 points) (
- Why do patients in mental institutions get so little attention in the public discourse? by 12 Jun 2021 17:37 UTC; 41 points) (
- War on Cancer II by 24 Jun 2021 14:59 UTC; 36 points) (
- What are some Civilizational Sanity Interventions? by 14 Jun 2020 1:38 UTC; 34 points) (
- 3 Oct 2017 14:59 UTC; 12 points) 's comment on Different Worlds by (
- The Dogma of Evidence-based Medicine by 25 Jan 2018 21:15 UTC; 11 points) (
- 14 Jan 2019 9:41 UTC; 10 points) 's comment on What are the open problems in Human Rationality? by (
- 14 Jun 2021 18:06 UTC; 9 points) 's comment on Shall we count the living or the dead? by (
- 20 May 2019 9:10 UTC; 9 points) 's comment on Simple Rules of Law by (
- 1 Jun 2021 21:05 UTC; 8 points) 's comment on TEAM: a dramatically improved form of therapy by (
- 10 Aug 2017 10:06 UTC; 6 points) 's comment on Prediction should be a sport by (
- 17 Nov 2017 20:35 UTC; 6 points) 's comment on Status Regulation and Anxious Underconfidence by (
- 31 May 2018 20:48 UTC; 5 points) 's comment on Monopoly: A Manifesto and Fact Post by (
- 17 Feb 2018 19:34 UTC; 4 points) 's comment on Replacing expensive costly signals by (
- 11 Dec 2017 22:01 UTC; 4 points) 's comment on Book Review: The Captured Economy by (
- 8 Jun 2021 11:04 UTC; 4 points) 's comment on Selection Has A Quality Ceiling by (
- 25 Jan 2018 9:48 UTC; 4 points) 's comment on What are the Best Hammers in the Rationalist Community? by (
- 6 Nov 2017 20:20 UTC; 4 points) 's comment on Moloch’s Toolbox (1/2) by (
- 2 Nov 2017 19:09 UTC; 4 points) 's comment on Inadequacy and Modesty by (
- 13 May 2018 11:10 UTC; 3 points) 's comment on Hotel Concierge: Shame & Society by (
- 27 Feb 2018 21:35 UTC; 2 points) 's comment on On funding medical research by (EA Forum;
- 3 Nov 2017 23:06 UTC; 2 points) 's comment on Cutting edge technology by (
- 8 Jul 2018 15:50 UTC; 2 points) 's comment on Book review: Pearl’s Book of Why by (
- 10 Jan 2017 10:06 UTC; 2 points) 's comment on Why a Theory of Change is better than a Theory of Action for acheiving goals by (
- 30 Jun 2022 20:26 UTC; 2 points) 's comment on Abadarian Trades by (
- 16 Oct 2017 6:04 UTC; 2 points) 's comment on Beta—First Impressions by (
- 16 Feb 2018 18:25 UTC; 2 points) 's comment on Subduing Moloch by (
- 3 Mar 2022 14:14 UTC; 2 points) 's comment on The Limits Of Medicine—Part 2 - Homogeneity Assumptions by (
- 12 Aug 2022 14:46 UTC; 2 points) 's comment on Progress links and tweets, 2022-08-09 by (
- 23 Jan 2018 16:59 UTC; 2 points) 's comment on Why everything might have taken so long by (
- 28 May 2017 9:15 UTC; 1 point) 's comment on - by (
- 4 Mar 2017 12:36 UTC; 0 points) 's comment on 5 Project Hufflepuff Suggestions for the Rationality Community by (
- 31 Jan 2017 16:13 UTC; 0 points) 's comment on Facets of Problems v0.1 by (
- 23 Feb 2017 7:48 UTC; 0 points) 's comment on Open Thread, Feb. 20 - Feb 26, 2017 by (
- 26 Jan 2017 12:45 UTC; 0 points) 's comment on Too Much Effort | Too Little Evidence by (
- 15 Feb 2017 10:01 UTC; 0 points) 's comment on Stupidity as a mental illness by (
- 12 Feb 2017 16:32 UTC; 0 points) 's comment on Stupidity as a mental illness by (
As a rule, practitioners are currently very averse to giving credence estimations to patients. Chesterton’s Fence: understand thoroughly why before tearing that down. Here are some possible reasons.
practitioners are unskilled in that estimation; their heuristics output a decision instead
patients are unskilled in interpreting that estimation and on average providing it would be harmful
hospitals disincentivize providing that information
insurance companies disincentivize providing that information
patients would rather be told what to do and would be averse to doctors otherwise
doctors would rather tell what to do and would be demoralized otherwise
I’m sure there are more. For each, if it were the main driver of the current situation, what would happen if you tried a startup that tore down the fence?
In general experts in all fields are adverse to giving credence estimations to their customers because it allows the customers to hold them accountable for bad advice.
A homeopath who would have to provide credence estimations might lose his job as a result. People in mainstream medicine with similar outcomes would also lose their jobs.
Even scientists don’t like telling you their credence for an experiment finding a statistical significant outcome before they run the experiment.
On a more metaphysical level various fields want objective knowledge that’s true regardless of the subjective judgement of a person. The search for transcendent absolute truth prevents people from being public with their subjective credence judgments.
Yes, they would need to learn to do proper estimations. Just like the good judgment project found that there are certain Superforcasters in political domains, we are likely to find that some doctors are much better at forecasting than others. It’s very valuable to find out who’s good at making those forecasts. Having strong economic pressure to develop the ability to forecast medical decisions is very useful for more broadly developing our way of understanding the human body and developing new treatments as well.
A patient might not be perfect at understanding what paying 600$ for a treatment with 60% chance of success means but if he’s shown various treatment option and there’s one treatment with 600$ that has a 30% chance of success and one that has a 600$ 60% chance of success.
There’s a lot of impetus in pretending that we provide the best possible medicine to every person. If this project would go through we would person who get’s the 1-bed hospital room might not only get more comfort than the person with the 4-bed hospital room. A lot of people dislike the idea of putting a cost on a human life. I think we should know what a good medical outcome costs for similar reasons as we want evidence in EA about what it costs to safe a human life.
I like this idea. I also like how it is presented.
I think resting the approach on prediction is great but as stated (the doctor making the predictions) it forces doctors to be good at both. While I think that this is basically a good idea as mentioned here
but in practice a solution would be needed to split the responsibilities otherwise you likely get mediocre estimates combined with mediocre treatment.
I don’t think the skill of predicting how likely pill A and pill B while cure the patient are independent of the skill of whether to recommend pill A or pill B to the patient. Training the skill of making the predictions should help with the skill of making treatment decisions.
In the proposed framework the treatment provider also isn’t completely alone at making the predictions. He makes the predictions over the website interface and the website has access to a lot of data on which it can run machine-learning algorithms. The website can help a treatment provider make better predictions than he would make otherwise.
It always worth saying that while the average treatment provider might combine mediocre estimates with mediocre treatments, there likely will be treatment providers who combine good estimates with good treatment. This framework means we would know the identity of those people. They could charge more money for their services and get status. Their colleagues would try to replicate their skills.
They might write a book about the topic and their colleagues would devour it to learn their insights. Companies that develop new treatments would consult with those people to waste less money on developing treatments that don’t work.
Do you have any ideas on how to ask people about outcomes, so that the system can distinguish between effects of treatments versus personalities/styles of reporting/biases of patients?
I fear that the best doctors in your proposed system would be those who are very skilled at predicting how optimistic a patient would be, and how willing to give good reviews, with only a minor part of this being the effectiveness of the treatment.
David Burns is one of the people who popularized CBT by writing the Feeling Good Handbook. He developed standardized tests to measure a variety of psychiatric questionnaires to score whether a person has depression, generalized anxiety disorders or other illnesses. Those tests are designed so that a patient can fill them out without supervision and they provide a good measure of the severity of an illness.
David Burns advocates with with his paradigm of doing psychology that he calls TEAM that psychologists should use those scores as a guideline to treat patients. TEAM is completely paper based but I think that system is valuable. I have never seen it in practice but I think the arguments that David Burns makes for going from CBT to TEAM make sense.
I also think that you can ask questions about problems such as back pain or an allergy in a way where someone who suffers from the illness will give different answers if he cures then when he isn’t cured.
I would also assume that health goals such as losing weight or stopping smoking have clear enough outcomes that you can find out whether the intervention worked by asking a patient well crafted questions.
While we are at the time lines, I would also ask again after 2 years, 3 years and 5 years to seek data on more long-term effects of an invention. It’s valuable data for a subject like weight loss or smoking cessation but unfortunately calibration for those claims won’t be available at the start of the project to allow patients to make purchasing decision based on them.
Many conditions have self-assessment questionnaires used in the research community. They could be given to patients at intervals.
That is one difficulty, but I expect a bigger and more fundamental difficulty is just that there’s lots of random noise in how patients respond to medical treatments.
Thought experiment: suppose every doctor were replaced by identical computers all running the same treatment-recommending software. How much does the variance in patient outcomes decrease? My gut says not very much. If it’s right, most variance isn’t doctor-level, it’s going to be higher-level (at the level of a disease or a hospital/clinic) or lower-level (patient-level).
To me the most obvious analogy is teaching. A standard finding in education research is that classroom/teacher-level variation is only a small part of the variation in educational outcomes. (Doing a quick Google...tables 1 & 5 of this highly-cited paper suggest it’s typically ~ 10% of the variance.) Education, like healthcare, is very expensive, mostly carried out for laypeople by trained specialists, and is generally considered (excepting people promoting their own one-size-fits-all solutions) a really knotty & complex thing to do, so I take the analogy seriously.
This matters because it means individual practitioners are going to have a hard time beating the EBM approach at estimating treatment effects, because the statistical win of assessing treatments at the finer-grained level of the practitioner is going to be more than cancelled out by the statistical loss of each practitioner having a smaller sample to refer to.
Imagine going from a multi-centre study of 40 specialists treating 1,600 people, to each specialist knowing about only their 40 patients. Each specialist then has only 1/40th the information they would’ve had, and that’s going to negate the slight gain of eliminating the effect of different specialists. (The specialists could of course tell each other about their results, but then one’s basically back to the large-scale, expensive EBM-style approach, and the agile, startuppy USP is lost.)
Allowing for doctor-level effects in analysis of treatments could help things, but I predict it would be a small improvement, and an improvement produced by extending the EBM approach, rather than building a parallel track to it.
Let’s look at schools. Having a masters degree in teaching doesn’t result in the teachers students getting higher grades on standardized tests. Unions still press schools to pay people who have a masters degree in education more money and mostly succeed at opposing pay-for-performance.
Education is a good example of a field where teachers are payed for useless training instead of being payed for producing outcomes for students.
Bill Gates suggest in his TED talk:
Given that citing a TED talk as a response to a scientific paper is a bit bad form, I added a question on skeptics to verify the claim.
I think part of the problem of the cited study is likely that it’s done on data from before no-child-left-behind and the efforts of the Gates Foundation. It’s also simply possible to find groups of teachers where there’s little variance in teaching skill, that doesn’t mean that differences don’t exist on a larger scale.
There are treatment that require high skill to administer like Brain surgery and treatments that require less skill like handing over a pill. For the high skill tasks I do think that variance in patient outcomes would decrease.
But even for simply taking a pill a doctor can spend five minutes to hand over the pill or he can spend an hour to talk through the issue with the patient and make sure that the patient has TAP’s to actually take the pills according to the schedule.
Currently there’s no economic reason to spend that hour. It can’t be billed to the insurance company. Even if currently everybody spends five minutes for such a patient you would suddenly get variance if you would start to pay some doctors by the outcome instead of simply paying them per visit.
I don’t think that large sample sizes are everything. I think you can learn a lot by looking very carefully at the details of single cases.
Additionally the person who’s the best in city X at treating disease Y for class of patients Z might get more than 40 patients of class Z with Y because everybody wants to be treated by the best.
If all those expensively trained doctors achieve the same outcomes there’s also a question of why we limit their supply as strongly as we are doing it at the present by forcing to them to have the expensive training. In that case the way to go would be to get people with cheaper training to compete in the market of the highly trained doctors. In the absence of performance tracking this won’t be possible because the expensively trained doctors have more prestige.
That doesn’t surprise me. Training and degrees are easily observable; outcomes (or, rather, how much of an outcome is attributable to each teacher) are not. It’s harder to pay people based on something less observable.
Thanks!
But I’m not sure it matters. What Bill Gates says in that quotation might well be consistent with what I wrote — it’s hard to be sure because he’s quite vague.
For instance, he says that a teacher in the top quartile increases their class’ performance “by over 10 percent in a single year”. I can believe that, but maybe all it means is that a top-quartile teacher increases their class’ scores by 11% a year, while a bottom-quartile teacher increases their class’ scores by 9% a year. That would hardly refute the idea that teacher effects are small on average!
Maybe. I did a bit more searching on Google Scholar but didn’t uncover a more recent review article with relevant statistics. (I did find one study of a thousand students in HSGI schools in “school year 2013-2014”, which found that 12% of the variance in GPA was at the teacher level.)
However, while the study I cited uses only pre-2004 data, I don’t see much reason to think a newer study would reveal a big increase in teacher-level variance in outcomes. The low proportion of variance attributable to teachers has been true, as far as I know, for as long as people have investigated it (at least 46 years). I’m doubtful that an act which demands high-stakes testing, improvements in average state-wide test scores, and state-wide standards for teachers has changed that, or that a charity has changed that.
It’s possible, but I don’t see evidence that the paper I cited has this flaw. Its new analysis was based on about 100 Tennessee schools included in Project STAR, and the earlier analyses it summarized (table 1) used “samples of poor or minority students”, “nationally representative samples of students”, and “a large sample of public school students in Texas”.
Fair point — my thought experiment only addresses diagnosis and treatment recommendation, not treatment administration. I think there would be more doctor-level variation among the latter...although not much more in absolute terms.
Probably true in most places.
Agreed that you’d get more variance, but I suspect it wouldn’t be much more (subject to this hypothetical’s exact details).
This is certainly true. I’ve seen too many stories of people with rare genetic conditions and other diseases successfully working out aetiologies to think otherwise. But those are unusual cases where people tended to invest lots of effort into figuring their cases out. I don’t see them as signs that we can improve the quality-to-cost ratio of healthcare in general by looking very carefully at the details of each case.
Sure. But the ratios in my example are more important than the exact numbers.
A good question, and a good answer to the question!
It’d be immediately possible in the US: eliminate de facto immigration barriers for foreign doctors. Those barriers are, I’d guess, lower in other developed countries (hence why doctors in the UK, Germany, Canada, etc. earn less than US doctors) but I expect doctors’ salaries there could also be reduced a bit by further relaxing immigration restrictions for foreign doctors.
Another option is for patients to go to the cheaper doctors: medical tourism.
I understand him to be speaking about them increasing 10% more than non-top quartile teachers.
Eg. enough to circumvent the US-Asia difference in two years and also enough to circumvent the Black-White difference in four years as suggested in the answer to the Stackexchange question.
It’s worth noting that Medical Error is the third leading cause of death in the US http://www.bmj.com/content/353/bmj.i2139
The article doesn’t only describe immigration barriers but also barriers of credentialism. The non-US degree often isn’t enough to work in the US as a doctor because it’s quality is in doubt. If there would be hard evidence that those doctors perform as well as US doctors the ability to seek rents via credentialism will be reduced.
I looked at that study. It seems their best predictors were e (i) Fall algebra EOC (End Of Course) scores, (ii) English language learner (ELL) status, (iii) Black student status, and (iv) Hispanic student status.
Of course you do a better prediction of the student performance for a standardized test when you look at the last standardized test they took than when you look at whether or not they had a very good teacher for a single school year. The 12% variance under that setting might be compatible with the claims that Gates makes. 12% variance per year might compound over multiple years to bridge the gap between the US and Asia in two years and US Black White gap in 4 years.
OK, thanks for clarifying. That sounds like a more impressive effect. At the same time, it’s probably still consistent with teacher quality explaining only 10% of the variance in student performance.
I’ll do back-of-envelope arithmetic to demonstrate. The median top-quartile teacher is at the 88th percentile. The median non-top quartile teacher is at the 38th. Suppose, just to allow me to arrive at concrete numbers, teacher quality has a normal distribution. Then the median top-quartile teacher is 1.48 standard deviations better than the median non-top quartile teacher. Now, an R^2 of 10% implies a correlation of sqrt(10%) = 0.23 between teacher quality and pupil performance, so the difference in pupil performance between the median non-top quartile teacher and the median top-quartile teacher is 1.48 * 0.23 = 0.34 standard deviations. That’s a statistically detectable effect, and one that could well translate into 10% higher test scores after a year with the better teachers.
Plausible. If I remember correctly the black/white difference is about 1 standard deviation, so if my estimated effect size of 0.34 SD for good vs. less good teachers is accurate and can be built on year by year, it’s enough to close the black/white difference in 3 years. I don’t know the US-Asia difference but probably the same kind of logic applies.
Agreed, medical error is a real & substantial issue. I am just dubious about the ability of some proposals to inexpensively reduce fatal medical error. (But I am optimistic about others. Checklists seem promising.)
The way I would put it is that the credentialism barriers are the immigration barriers. AFAIK the explicit immigration barriers for foreign doctors looking to enter the US and practice in the US aren’t the bottleneck; the requirement that the doctor do a US residence programme, or a degree from a US school, is a much stronger de facto bar to immigrating.
I agree with your last paragraph.
In the present system there aren’t strong economic incentives to reduce medical error. If you consider Checklists to be promising, then the lack of any economic incentives to use their virtues might be part of the reason why they don’t get adopted.
The incentive system of doing procedures that can be billed because they are included in a list of billable procedures and doing them in a defensive way that survives a lawsuit is bad. It means that money is wasted for procedures that cost a lot of money and provide little benefit. It also means that policies such as checklists (if we grant them to work) don’t get incentivised.
The whole system is unable to incentivise cheap solutions. Scott’s post about the inability of a hospital to prescribe Melatonin to it’s patients is illustrative:
This is why the story of Ramelteon scares me so much – not because it’s a bad drug, because it isn’t. But because one of the most basic and useful human hormones got completely excluded from medicine just because it didn’t have a drug company to push it. And the only way it managed to worm its way back in was to have a pharmaceutial company spend a decade and several hundred million dollars to tweak its chemical structure very slightly, patent it, and market it as a hot new drug at a 2000% markup.
From a political perspective immigration and credentialism are two different subjects, you have to convince different constituencies to create change.
I think this is broadly correct, certainly in the case of the US medical system.
Yes, from the standpoint of effecting political change, one might have to treat them as two different subjects, even though w.r.t. doctors in the US the two greatly overlap.