I broadly differ with the hansonian take on medicine. I think metamed failed not because it offered more effective healing but went bust because medicine doesn’t really demand healing; but rather that medicine is about healing, generally does this pretty well, and Metamed was unable to provide a significant edge in performance over standard medicine. (I should note I am a doctor, albeit a somewhat contrarian one. I wrote the 80k careers guide on medicine).
I think medicine is generally less fertile ground for hansonian signalling accounts, principally because health is so important for our life and happiness we’re less willing to sacrifice it to preserve face (I’d wager it is an even better tax on bs than money). If the efficacy of marginal health spending is near zero in rich countries, that seems evidence in support of, ‘medicine is really about healing’ - we want to live healthily so much we chase the returns curve all the way to zero!
There are all manner of ways in which western world medicine does badly, but I think sometimes the faults are overblown, and the remainder are best explained by human failings rather than medicine being a sham practice:
1) My understanding of the algorithms for diagnosis is that although linear regressions and simple methods can beat humans at very precise diagnostic questions (e.g. ’Given these factors of a patient who is mentally ill, what is their likelihood of committing suicide?), humans still have better performance in messier (and more realistic) situations. It’d be surprising for IBM to unleash Watson on a very particular aspect of medicine (therapeutic choice in oncology) if simple methods could beat doctors across most of the board.
(I’d be very interested to see primary sources if my conviction is mistaken)
2) Medicine has become steadily more and more protocolized, and clinical decision rules, standard operating procedures and standards of care are proliferating rapidly. I agree this should have happened sooner: that Atul Gwande’s surgical checklist happened within living memory is amazing, but it is catching on, and (mildly against hansonian explanations) has been propelled by better outcomes.
I can’t speak for the US, but there are clear protocols in the UK about initial emergency management of heart attacks. Indeed, take a gander at the UK’s ‘NICE Pathways’ which gives a flow chart on how to act in all circumstances where a heart attack is suspected.
3) I agree that the lack of efficacy information about individual doctors isn’t great. Reliable data on this is far from trivial to acquire however, and that with doctors understandable self-interest not to be too closely monitored seems to explain this lacuna as well as the hansonian story. (Patients tend to want to know this information if it is available, which doesn’t fit well with them colluding with their doctors and family in a medical ritual unconnected to their survival).
4) Over-treatment is rife, but the US is generally held up as an anti-examplar of this fault, and (at least judging by the anecdotes) medics in the UK are better (albeit still far from perfect) at flogging the patient to death with medical torture. Outside of this zero or negative margin, performance is better: it is unclear how much is attributable to medicine, but life expectancy, disease free life expectancy, and age-standardized mortality rates for most conditions are declining.
Now, why Metamed failed (I appreciate one should get basically no credit for predicting a start up will fail given this is the usual outcome, but I called it a long time ago):
Metamed’s business model relied on there being a lot of low hanging fruit to pluck. That in many cases, a diagnosis or treatment would elude the clinician because they weren’t appraised of the most recent evidence, were only able to deal in generalities rather than personalized recommendations, or that they just were less adept at synthesizing the evidence available.
If it were Metamed versus the average doctor—the one who spends next-to-no time reading academic papers, who is incredibly busy, stressed out, and so on, you’d be forgiven for thinking that metamed has an edge. However, medics (especially generalists) have long realized they have no hope of keeping abreast of a large medical literature on their own. Enter division of labour: they instead commission the relevant experts to survey, aggregate and summarize the current state of the evidence base, leaving them the simpler task of applying in their practice. To make sure it was up to date, they’d commission the experts to repeat this fairly often.
I mentioned NICE (National Institute of Clinical Excellence) earlier. They’re a body in the UK who are responsible (inter alia) for deciding when drugs and treatments get funded on the NHS. They spend a vast amount of time on evidence synthesis and meta-analysis. To see what sort of work this produces google ‘NICE {condition}’. An example for depression is here. Although I think the UK is world leading in this aspect, there are similar bodies in similar countries in other countries, as well as commercial organizations (e.g. Uptodate.)
Against this, Metamed never had any edge: they didn’t have groups of subject matter experts to call upon for each condition or treatment in question, nor (despite a lot of mathsy inclination amongst them) did they by and large have parity in terms of meta-analysis, evidence synthesis and related skills. They were also outmatched in terms of quantity of man hours that could be deployed, and the great headstart NICE et al. already had. When their website was still up I looked at some of their example reports, and my view was they were significantly inferior to what you could get via NICE (for free!) or Uptodate or similar services for their lower fees.
MEtamed might have had a hope if in the course of producing these general evidence summaries, a lot of fine-grained data was being aggregated out to produce something ‘one size fits all’ - their edge would be going back to the original data to find out that although generally drug X is good for a condition, in ones particular case in virtue of age, genotype, or whatever else, drug Y is superior.
However, this data by and large does not exist: much of medicine is still at the stage of working out whether something works generally, rather than delving into differential response and efficacy. It is not clear it ever will—humans might be sufficiently similar to one another that for almost all of them one treatment will be the best. The general success of increasing protocolization in medicine is some further weak evidence of this point.
I generally adduce meta-med as an example of rationalist overconfidence. That insurgent Bayesians can just trounce relevant professionals in terms of what they purport to do thanks to signalling etc. But again, given the expectation was for it to fail (as most start ups do), this doesn’t provide evidence. If it had succeeded, I’d have updated much more strongly in the magic of rationalism meaning you can win and the world being generally dysfunctional.
principally because health is so important for our life and happiness we’re less willing to sacrifice it to preserve face (I’d wager it is an even better tax on bs than money).
I agree that I expect people to be more willing to trade money for face than health for face. I think the system is slanted too heavily towards face, though.
I should also point out that this is mostly a demand side problem. If it were only a supply side problem, MetaMed could have won, but it’s not—people are interested in face more than they’re interested in health (see the example of the outdated brochure that was missing the key medical information, but looked like how a medical brochure is supposed to look).
It’d be surprising for IBM to unleash Watson on a very particular aspect of medicine (therapeutic choice in oncology) if simple methods could beat doctors across most of the board.
My understanding is that this is correct for the simple techniques, but incorrect for the complicated techniques. That is, you’re right that a single linear regression can’t replace a GP but a NLP engine plus a twenty questions bot plus a causal network probably could. (I unfortunately don’t have any primary sources at hand; medical diagnostics is an interest but most of the academic citations I know are all machine diagnostics, since that’s what my research was in.)
I should also mention that, from the ML side, the technical innovation of Watson is in the NLP engine. That is, a patient could type English into a keyboard and Watson would mostly understand what they’re saying, instead of needing a nurse or doctor to translate the English into the format needed by the diagnostic tool. The main challenge with uptake of the simple techniques historically was that they only did the final computation, but most of the work in diagnostics is collecting the information from the patient. And so if the physcian is 78% accurate and the linear regression is 80% accurate, is it really worth running the numbers for those extra 2%?
From a business standpoint, I think it’s obvious why IBM is moving slowly; just like with self-driving cars, the hard problems are primarily legal and social, not technical. Even if Watson has half the error rate of a normal doctor, the legal liability status is very different, just like a self-driving car that has half the error rate of a human driver would result in more lawsuits for the manufacturer, not less. As well, if the end goal is to replace doctors, the right way to do that is imperceptibly hand more and more work over to the machines, not to jump out of the gate with a “screw you, humans!”
I agree this should have happened sooner: that Atul Gwande’s surgical checklist happened within living memory is amazing, but it is catching on, and (mildly against hansonian explanations) has been propelled by better outcomes.
So, just like the Hansonian view of Effective Altruism is that it replaces Pretending to Try not with Actually Trying but with Pretending to Actually Try, if there is sufficient pressure to pretend to care about outcomes then we should expect people to move towards better outcomes as their pretending has nonzero effort.
But I think you can look at the historical spread of anesthesia vs. the historical spread of antiseptics to get a sense of the relative importance of physician convenience and patient outcomes. (This is, I think, a point brought up by Gawande.)
I think I agree with your observations about MetaMed’s competition but not necessarily about your interpretation. That is, MetaMed could have easily failed for both the reasons that its competition was strong and that its customers weren’t willing to pay for its services. I put more weight on the latter because the experience that MetaMed reported was mostly not “X doesn’t want to pay $5k for what they can get for free from NICE” but “X agrees that this is worth $100k to them, but would like to only pay me $5k for it.” (This could easily be a selection effect issue, where everyone who would choose NICE instead is silent about it.)
However, this data by and large does not exist: much of medicine is still at the stage of working out whether something works generally, rather than delving into differential response and efficacy. It is not clear it ever will—humans might be sufficiently similar to one another that for almost all of them one treatment will be the best. The general success of increasing protocolization in medicine is some further weak evidence of this point.
This is why I’m most optimistic about machine medicine, because it basically means instead of going to a doctor (who is tired / stressed / went to medical school twenty years ago and only sort of keeps up) you go to the interactive NICE protocol bot, which asks you questions / looks at your SNPs and tracked weight/heart rate/steps/sleep/etc. data / calls in a nurse or technician to investigate a specific issue, diagnoses the issue and prescribes treatment, then follows up and adjusts its treatment outcome expectations accordingly.
(Sorry for delay, and thanks for the formatting note.)
My knowledge is not very up to date re. machine medicine, but I did get to play with some of the commercially available systems, and I wasn’t hugely impressed. There may be a lot more impressive results yet to be released commercially but (appealing back to my priors) I think I would have heard of it as it would be a gamechanger for global health. Also, if fairly advanced knowledge work of primary care can be done by computer, I’d expect a lot of jobs without the protective features of medicine to be automated.
I agree that machine medicine along the lines you suggest will be superior to human performance, and I anticipate this to be achieved (even if I am right and it hasn’t already happened) fairly soon. I think medicine will survive less by the cognitive skill required, but rather though technical facility and social interactions, where machines comparably lag (of course, I anticipate they will steadily get better at this too).
I grant a hansonian account can accomodate this sort of ‘guided by efficacy’ data I suggest by ‘pretending to actually try’ considerations, but I would suggest this almost becomes an epicycle: any data which supports medicine being about healing can be explained away by the claim that they’re only pretending to be about healing as a circuitous route to signalling. I would say the general ethos of medicine (EBM, profileration of trials) looks like pro tanto reasons in favour about being about healing, and divergence from this (e.g. what happened to semmelweis, other lags) is better explained by doctors being imperfect and selfish, and patients irrational, rather than both parties adeptly following a signalling account.
But I struggle to see what evidence could neatly distinguish between these cases. If you have an idea, I’d be keen to hear it. :)
I agree with the selection worry re. Metamed’s customers: they also are assumedly selected from people who modern medicine didn’t help, which may also have some effects (not to mention making Metameds task harder, as their pool will be harder to treat than unselected-for-failure cases who see the doctor ‘first line’). I’d also (with all respect meant to the staff of Metamed) suggest staff of Metamed may not be the most objective sources of why it failed: I’d guess people would prefer to say their startups failed because of the market or product market fit, rather than ‘actually, our product was straight worse than our competitors’.
But I struggle to see what evidence could neatly distinguish between these cases. If you have an idea, I’d be keen to hear it. :)
I’m not sure there’s much of a difference between the “doctors care about healing, but run into imperfection and seflishness” interpretation and the “doctors optimize for signalling, but that requires some healing as a side effect” interpretation besides which piece goes before the ‘but’ and which piece goes after.
The main difference I do see is that if ‘selfishness’ means ‘status’ then we might see different defection than if ‘selfishness’ means ‘greed.’ I’m not sure there’s enough difference between them for a clear comparison to be made, though. Greedy doctors will push for patients to do costly but unnecessary procedures, but status-seeking doctors will also push for patients to do costly but unnecessary procedures because it makes them seem more important and necessary.
I broadly differ with the hansonian take on medicine. I think metamed failed not because it offered more effective healing but went bust because medicine doesn’t really demand healing; but rather that medicine is about healing, generally does this pretty well, and Metamed was unable to provide a significant edge in performance over standard medicine. (I should note I am a doctor, albeit a somewhat contrarian one. I wrote the 80k careers guide on medicine).
I think medicine is generally less fertile ground for hansonian signalling accounts, principally because health is so important for our life and happiness we’re less willing to sacrifice it to preserve face (I’d wager it is an even better tax on bs than money). If the efficacy of marginal health spending is near zero in rich countries, that seems evidence in support of, ‘medicine is really about healing’ - we want to live healthily so much we chase the returns curve all the way to zero!
There are all manner of ways in which western world medicine does badly, but I think sometimes the faults are overblown, and the remainder are best explained by human failings rather than medicine being a sham practice:
1) My understanding of the algorithms for diagnosis is that although linear regressions and simple methods can beat humans at very precise diagnostic questions (e.g. ’Given these factors of a patient who is mentally ill, what is their likelihood of committing suicide?), humans still have better performance in messier (and more realistic) situations. It’d be surprising for IBM to unleash Watson on a very particular aspect of medicine (therapeutic choice in oncology) if simple methods could beat doctors across most of the board.
(I’d be very interested to see primary sources if my conviction is mistaken)
2) Medicine has become steadily more and more protocolized, and clinical decision rules, standard operating procedures and standards of care are proliferating rapidly. I agree this should have happened sooner: that Atul Gwande’s surgical checklist happened within living memory is amazing, but it is catching on, and (mildly against hansonian explanations) has been propelled by better outcomes.
I can’t speak for the US, but there are clear protocols in the UK about initial emergency management of heart attacks. Indeed, take a gander at the UK’s ‘NICE Pathways’ which gives a flow chart on how to act in all circumstances where a heart attack is suspected.
3) I agree that the lack of efficacy information about individual doctors isn’t great. Reliable data on this is far from trivial to acquire however, and that with doctors understandable self-interest not to be too closely monitored seems to explain this lacuna as well as the hansonian story. (Patients tend to want to know this information if it is available, which doesn’t fit well with them colluding with their doctors and family in a medical ritual unconnected to their survival).
4) Over-treatment is rife, but the US is generally held up as an anti-examplar of this fault, and (at least judging by the anecdotes) medics in the UK are better (albeit still far from perfect) at flogging the patient to death with medical torture. Outside of this zero or negative margin, performance is better: it is unclear how much is attributable to medicine, but life expectancy, disease free life expectancy, and age-standardized mortality rates for most conditions are declining.
Now, why Metamed failed (I appreciate one should get basically no credit for predicting a start up will fail given this is the usual outcome, but I called it a long time ago):
Metamed’s business model relied on there being a lot of low hanging fruit to pluck. That in many cases, a diagnosis or treatment would elude the clinician because they weren’t appraised of the most recent evidence, were only able to deal in generalities rather than personalized recommendations, or that they just were less adept at synthesizing the evidence available.
If it were Metamed versus the average doctor—the one who spends next-to-no time reading academic papers, who is incredibly busy, stressed out, and so on, you’d be forgiven for thinking that metamed has an edge. However, medics (especially generalists) have long realized they have no hope of keeping abreast of a large medical literature on their own. Enter division of labour: they instead commission the relevant experts to survey, aggregate and summarize the current state of the evidence base, leaving them the simpler task of applying in their practice. To make sure it was up to date, they’d commission the experts to repeat this fairly often.
I mentioned NICE (National Institute of Clinical Excellence) earlier. They’re a body in the UK who are responsible (inter alia) for deciding when drugs and treatments get funded on the NHS. They spend a vast amount of time on evidence synthesis and meta-analysis. To see what sort of work this produces google ‘NICE {condition}’. An example for depression is here. Although I think the UK is world leading in this aspect, there are similar bodies in similar countries in other countries, as well as commercial organizations (e.g. Uptodate.)
Against this, Metamed never had any edge: they didn’t have groups of subject matter experts to call upon for each condition or treatment in question, nor (despite a lot of mathsy inclination amongst them) did they by and large have parity in terms of meta-analysis, evidence synthesis and related skills. They were also outmatched in terms of quantity of man hours that could be deployed, and the great headstart NICE et al. already had. When their website was still up I looked at some of their example reports, and my view was they were significantly inferior to what you could get via NICE (for free!) or Uptodate or similar services for their lower fees.
MEtamed might have had a hope if in the course of producing these general evidence summaries, a lot of fine-grained data was being aggregated out to produce something ‘one size fits all’ - their edge would be going back to the original data to find out that although generally drug X is good for a condition, in ones particular case in virtue of age, genotype, or whatever else, drug Y is superior.
However, this data by and large does not exist: much of medicine is still at the stage of working out whether something works generally, rather than delving into differential response and efficacy. It is not clear it ever will—humans might be sufficiently similar to one another that for almost all of them one treatment will be the best. The general success of increasing protocolization in medicine is some further weak evidence of this point.
I generally adduce meta-med as an example of rationalist overconfidence. That insurgent Bayesians can just trounce relevant professionals in terms of what they purport to do thanks to signalling etc. But again, given the expectation was for it to fail (as most start ups do), this doesn’t provide evidence. If it had succeeded, I’d have updated much more strongly in the magic of rationalism meaning you can win and the world being generally dysfunctional.
Formatting note: the brackets for links are greedy, so you need to escape them with a \ to avoid a long link.
[Testing] a long link
[Testing] a short link
I agree that I expect people to be more willing to trade money for face than health for face. I think the system is slanted too heavily towards face, though.
I should also point out that this is mostly a demand side problem. If it were only a supply side problem, MetaMed could have won, but it’s not—people are interested in face more than they’re interested in health (see the example of the outdated brochure that was missing the key medical information, but looked like how a medical brochure is supposed to look).
My understanding is that this is correct for the simple techniques, but incorrect for the complicated techniques. That is, you’re right that a single linear regression can’t replace a GP but a NLP engine plus a twenty questions bot plus a causal network probably could. (I unfortunately don’t have any primary sources at hand; medical diagnostics is an interest but most of the academic citations I know are all machine diagnostics, since that’s what my research was in.)
I should also mention that, from the ML side, the technical innovation of Watson is in the NLP engine. That is, a patient could type English into a keyboard and Watson would mostly understand what they’re saying, instead of needing a nurse or doctor to translate the English into the format needed by the diagnostic tool. The main challenge with uptake of the simple techniques historically was that they only did the final computation, but most of the work in diagnostics is collecting the information from the patient. And so if the physcian is 78% accurate and the linear regression is 80% accurate, is it really worth running the numbers for those extra 2%?
From a business standpoint, I think it’s obvious why IBM is moving slowly; just like with self-driving cars, the hard problems are primarily legal and social, not technical. Even if Watson has half the error rate of a normal doctor, the legal liability status is very different, just like a self-driving car that has half the error rate of a human driver would result in more lawsuits for the manufacturer, not less. As well, if the end goal is to replace doctors, the right way to do that is imperceptibly hand more and more work over to the machines, not to jump out of the gate with a “screw you, humans!”
So, just like the Hansonian view of Effective Altruism is that it replaces Pretending to Try not with Actually Trying but with Pretending to Actually Try, if there is sufficient pressure to pretend to care about outcomes then we should expect people to move towards better outcomes as their pretending has nonzero effort.
But I think you can look at the historical spread of anesthesia vs. the historical spread of antiseptics to get a sense of the relative importance of physician convenience and patient outcomes. (This is, I think, a point brought up by Gawande.)
I think I agree with your observations about MetaMed’s competition but not necessarily about your interpretation. That is, MetaMed could have easily failed for both the reasons that its competition was strong and that its customers weren’t willing to pay for its services. I put more weight on the latter because the experience that MetaMed reported was mostly not “X doesn’t want to pay $5k for what they can get for free from NICE” but “X agrees that this is worth $100k to them, but would like to only pay me $5k for it.” (This could easily be a selection effect issue, where everyone who would choose NICE instead is silent about it.)
This is why I’m most optimistic about machine medicine, because it basically means instead of going to a doctor (who is tired / stressed / went to medical school twenty years ago and only sort of keeps up) you go to the interactive NICE protocol bot, which asks you questions / looks at your SNPs and tracked weight/heart rate/steps/sleep/etc. data / calls in a nurse or technician to investigate a specific issue, diagnoses the issue and prescribes treatment, then follows up and adjusts its treatment outcome expectations accordingly.
(Sorry for delay, and thanks for the formatting note.)
My knowledge is not very up to date re. machine medicine, but I did get to play with some of the commercially available systems, and I wasn’t hugely impressed. There may be a lot more impressive results yet to be released commercially but (appealing back to my priors) I think I would have heard of it as it would be a gamechanger for global health. Also, if fairly advanced knowledge work of primary care can be done by computer, I’d expect a lot of jobs without the protective features of medicine to be automated.
I agree that machine medicine along the lines you suggest will be superior to human performance, and I anticipate this to be achieved (even if I am right and it hasn’t already happened) fairly soon. I think medicine will survive less by the cognitive skill required, but rather though technical facility and social interactions, where machines comparably lag (of course, I anticipate they will steadily get better at this too).
I grant a hansonian account can accomodate this sort of ‘guided by efficacy’ data I suggest by ‘pretending to actually try’ considerations, but I would suggest this almost becomes an epicycle: any data which supports medicine being about healing can be explained away by the claim that they’re only pretending to be about healing as a circuitous route to signalling. I would say the general ethos of medicine (EBM, profileration of trials) looks like pro tanto reasons in favour about being about healing, and divergence from this (e.g. what happened to semmelweis, other lags) is better explained by doctors being imperfect and selfish, and patients irrational, rather than both parties adeptly following a signalling account.
But I struggle to see what evidence could neatly distinguish between these cases. If you have an idea, I’d be keen to hear it. :)
I agree with the selection worry re. Metamed’s customers: they also are assumedly selected from people who modern medicine didn’t help, which may also have some effects (not to mention making Metameds task harder, as their pool will be harder to treat than unselected-for-failure cases who see the doctor ‘first line’). I’d also (with all respect meant to the staff of Metamed) suggest staff of Metamed may not be the most objective sources of why it failed: I’d guess people would prefer to say their startups failed because of the market or product market fit, rather than ‘actually, our product was straight worse than our competitors’.
I’m not sure there’s much of a difference between the “doctors care about healing, but run into imperfection and seflishness” interpretation and the “doctors optimize for signalling, but that requires some healing as a side effect” interpretation besides which piece goes before the ‘but’ and which piece goes after.
The main difference I do see is that if ‘selfishness’ means ‘status’ then we might see different defection than if ‘selfishness’ means ‘greed.’ I’m not sure there’s enough difference between them for a clear comparison to be made, though. Greedy doctors will push for patients to do costly but unnecessary procedures, but status-seeking doctors will also push for patients to do costly but unnecessary procedures because it makes them seem more important and necessary.