Three main sources. (But first the disclaimer About Isn’t About You seems relevant—that is, even if medicine is all a sham (which I don’t believe), participating in the medical system isn’t necessarily a black mark on you personally.)
First is Robin Hanson’s summary on the literature on health economics. The medicine tag on Robin’s blog has a lot, but a good place to start is probably Cut Medicine in Half and Medicine as Scandal followed by Farm and Pet Medicine and Dog vs. Cat Medicine. To summarize it shortly, it looks like medical spending is driven by demand effects (we care so we spend to show we care) rather than supply effects (medicine is better so we consume more) or efficacy (we don’t keep good records of how effective various doctors are). His proposal for how to fund medicine shows what he thinks a more sane system would look like. (As ‘cut medicine in half’ suggests, he doesn’t think the average medical spending has a non-positive effect, but that the marginal medical spending does, to a very deep degree.)
Second is the efficiency literature on medicine. This is statisticians and efficiency experts and so on trying to apply standard industrial techniques to medicine and getting pushback that looks ludicrous to me. For example, human diagnosticians perform at the level or worse than simple algorithms (I’m talking linear regressions, here, not even neural networks or decision trees or so on), and this has been known in the efficiency literature for well over fifty years. Only in rare cases does this actually get implemented in practice (for example, a flowchart for dealing with heart attacks in emergency rooms was popularized a few years back and seems to have had widespread acceptance). It’s kind of horrifying to realize that our society is smarter about, say, streamlining the production of cars than we are streamlining the production of health, especially given the truly horrifying scale of medical errors. Stories like Semmelweis and the difficulty getting doctors to wash their hands between patients further expand this view.
Third is from ‘the other side’; my father was a pastor and thus spent quite some time with dying people and their families. His experience, which is echoed by Yvain in Who By Very Slow Decay and seems to be the common opinion among end-of-life professionals in general, is that the person receiving end-of-life care generally doesn’t want it and would rather die in peace, and the people around them insist that they get it (mostly so that they don’t seem heartless). As Yvain puts it:
Robin Hanson sometimes writes about how health care is a form of signaling, trying to spend money to show you care about someone else. I think he’s wrong in the general case – most people pay their own health insurance – but I think he’s spot on in the case of families caring for their elderly relatives. The hospital lawyer mentioned during orientation that it never fails that the family members who live in the area and have spent lots of time with their mother/father/grandparent over the past few years are willing to let them go, but someone from 2000 miles away flies in at the last second and makes ostentatious demands that EVERYTHING POSSIBLE must be done for the patient.
Once you really grok that a huge amount of medical spending is useless torture, and if you are familiar with what it looks like to design a system to achieve an end, it becomes impossible to see the point of our medical system as healing people.
I broadly differ with the hansonian take on medicine. I think metamed failed not because it offered more effective healing but went bust because medicine doesn’t really demand healing; but rather that medicine is about healing, generally does this pretty well, and Metamed was unable to provide a significant edge in performance over standard medicine. (I should note I am a doctor, albeit a somewhat contrarian one. I wrote the 80k careers guide on medicine).
I think medicine is generally less fertile ground for hansonian signalling accounts, principally because health is so important for our life and happiness we’re less willing to sacrifice it to preserve face (I’d wager it is an even better tax on bs than money). If the efficacy of marginal health spending is near zero in rich countries, that seems evidence in support of, ‘medicine is really about healing’ - we want to live healthily so much we chase the returns curve all the way to zero!
There are all manner of ways in which western world medicine does badly, but I think sometimes the faults are overblown, and the remainder are best explained by human failings rather than medicine being a sham practice:
1) My understanding of the algorithms for diagnosis is that although linear regressions and simple methods can beat humans at very precise diagnostic questions (e.g. ’Given these factors of a patient who is mentally ill, what is their likelihood of committing suicide?), humans still have better performance in messier (and more realistic) situations. It’d be surprising for IBM to unleash Watson on a very particular aspect of medicine (therapeutic choice in oncology) if simple methods could beat doctors across most of the board.
(I’d be very interested to see primary sources if my conviction is mistaken)
2) Medicine has become steadily more and more protocolized, and clinical decision rules, standard operating procedures and standards of care are proliferating rapidly. I agree this should have happened sooner: that Atul Gwande’s surgical checklist happened within living memory is amazing, but it is catching on, and (mildly against hansonian explanations) has been propelled by better outcomes.
I can’t speak for the US, but there are clear protocols in the UK about initial emergency management of heart attacks. Indeed, take a gander at the UK’s ‘NICE Pathways’ which gives a flow chart on how to act in all circumstances where a heart attack is suspected.
3) I agree that the lack of efficacy information about individual doctors isn’t great. Reliable data on this is far from trivial to acquire however, and that with doctors understandable self-interest not to be too closely monitored seems to explain this lacuna as well as the hansonian story. (Patients tend to want to know this information if it is available, which doesn’t fit well with them colluding with their doctors and family in a medical ritual unconnected to their survival).
4) Over-treatment is rife, but the US is generally held up as an anti-examplar of this fault, and (at least judging by the anecdotes) medics in the UK are better (albeit still far from perfect) at flogging the patient to death with medical torture. Outside of this zero or negative margin, performance is better: it is unclear how much is attributable to medicine, but life expectancy, disease free life expectancy, and age-standardized mortality rates for most conditions are declining.
Now, why Metamed failed (I appreciate one should get basically no credit for predicting a start up will fail given this is the usual outcome, but I called it a long time ago):
Metamed’s business model relied on there being a lot of low hanging fruit to pluck. That in many cases, a diagnosis or treatment would elude the clinician because they weren’t appraised of the most recent evidence, were only able to deal in generalities rather than personalized recommendations, or that they just were less adept at synthesizing the evidence available.
If it were Metamed versus the average doctor—the one who spends next-to-no time reading academic papers, who is incredibly busy, stressed out, and so on, you’d be forgiven for thinking that metamed has an edge. However, medics (especially generalists) have long realized they have no hope of keeping abreast of a large medical literature on their own. Enter division of labour: they instead commission the relevant experts to survey, aggregate and summarize the current state of the evidence base, leaving them the simpler task of applying in their practice. To make sure it was up to date, they’d commission the experts to repeat this fairly often.
I mentioned NICE (National Institute of Clinical Excellence) earlier. They’re a body in the UK who are responsible (inter alia) for deciding when drugs and treatments get funded on the NHS. They spend a vast amount of time on evidence synthesis and meta-analysis. To see what sort of work this produces google ‘NICE {condition}’. An example for depression is here. Although I think the UK is world leading in this aspect, there are similar bodies in similar countries in other countries, as well as commercial organizations (e.g. Uptodate.)
Against this, Metamed never had any edge: they didn’t have groups of subject matter experts to call upon for each condition or treatment in question, nor (despite a lot of mathsy inclination amongst them) did they by and large have parity in terms of meta-analysis, evidence synthesis and related skills. They were also outmatched in terms of quantity of man hours that could be deployed, and the great headstart NICE et al. already had. When their website was still up I looked at some of their example reports, and my view was they were significantly inferior to what you could get via NICE (for free!) or Uptodate or similar services for their lower fees.
MEtamed might have had a hope if in the course of producing these general evidence summaries, a lot of fine-grained data was being aggregated out to produce something ‘one size fits all’ - their edge would be going back to the original data to find out that although generally drug X is good for a condition, in ones particular case in virtue of age, genotype, or whatever else, drug Y is superior.
However, this data by and large does not exist: much of medicine is still at the stage of working out whether something works generally, rather than delving into differential response and efficacy. It is not clear it ever will—humans might be sufficiently similar to one another that for almost all of them one treatment will be the best. The general success of increasing protocolization in medicine is some further weak evidence of this point.
I generally adduce meta-med as an example of rationalist overconfidence. That insurgent Bayesians can just trounce relevant professionals in terms of what they purport to do thanks to signalling etc. But again, given the expectation was for it to fail (as most start ups do), this doesn’t provide evidence. If it had succeeded, I’d have updated much more strongly in the magic of rationalism meaning you can win and the world being generally dysfunctional.
principally because health is so important for our life and happiness we’re less willing to sacrifice it to preserve face (I’d wager it is an even better tax on bs than money).
I agree that I expect people to be more willing to trade money for face than health for face. I think the system is slanted too heavily towards face, though.
I should also point out that this is mostly a demand side problem. If it were only a supply side problem, MetaMed could have won, but it’s not—people are interested in face more than they’re interested in health (see the example of the outdated brochure that was missing the key medical information, but looked like how a medical brochure is supposed to look).
It’d be surprising for IBM to unleash Watson on a very particular aspect of medicine (therapeutic choice in oncology) if simple methods could beat doctors across most of the board.
My understanding is that this is correct for the simple techniques, but incorrect for the complicated techniques. That is, you’re right that a single linear regression can’t replace a GP but a NLP engine plus a twenty questions bot plus a causal network probably could. (I unfortunately don’t have any primary sources at hand; medical diagnostics is an interest but most of the academic citations I know are all machine diagnostics, since that’s what my research was in.)
I should also mention that, from the ML side, the technical innovation of Watson is in the NLP engine. That is, a patient could type English into a keyboard and Watson would mostly understand what they’re saying, instead of needing a nurse or doctor to translate the English into the format needed by the diagnostic tool. The main challenge with uptake of the simple techniques historically was that they only did the final computation, but most of the work in diagnostics is collecting the information from the patient. And so if the physcian is 78% accurate and the linear regression is 80% accurate, is it really worth running the numbers for those extra 2%?
From a business standpoint, I think it’s obvious why IBM is moving slowly; just like with self-driving cars, the hard problems are primarily legal and social, not technical. Even if Watson has half the error rate of a normal doctor, the legal liability status is very different, just like a self-driving car that has half the error rate of a human driver would result in more lawsuits for the manufacturer, not less. As well, if the end goal is to replace doctors, the right way to do that is imperceptibly hand more and more work over to the machines, not to jump out of the gate with a “screw you, humans!”
I agree this should have happened sooner: that Atul Gwande’s surgical checklist happened within living memory is amazing, but it is catching on, and (mildly against hansonian explanations) has been propelled by better outcomes.
So, just like the Hansonian view of Effective Altruism is that it replaces Pretending to Try not with Actually Trying but with Pretending to Actually Try, if there is sufficient pressure to pretend to care about outcomes then we should expect people to move towards better outcomes as their pretending has nonzero effort.
But I think you can look at the historical spread of anesthesia vs. the historical spread of antiseptics to get a sense of the relative importance of physician convenience and patient outcomes. (This is, I think, a point brought up by Gawande.)
I think I agree with your observations about MetaMed’s competition but not necessarily about your interpretation. That is, MetaMed could have easily failed for both the reasons that its competition was strong and that its customers weren’t willing to pay for its services. I put more weight on the latter because the experience that MetaMed reported was mostly not “X doesn’t want to pay $5k for what they can get for free from NICE” but “X agrees that this is worth $100k to them, but would like to only pay me $5k for it.” (This could easily be a selection effect issue, where everyone who would choose NICE instead is silent about it.)
However, this data by and large does not exist: much of medicine is still at the stage of working out whether something works generally, rather than delving into differential response and efficacy. It is not clear it ever will—humans might be sufficiently similar to one another that for almost all of them one treatment will be the best. The general success of increasing protocolization in medicine is some further weak evidence of this point.
This is why I’m most optimistic about machine medicine, because it basically means instead of going to a doctor (who is tired / stressed / went to medical school twenty years ago and only sort of keeps up) you go to the interactive NICE protocol bot, which asks you questions / looks at your SNPs and tracked weight/heart rate/steps/sleep/etc. data / calls in a nurse or technician to investigate a specific issue, diagnoses the issue and prescribes treatment, then follows up and adjusts its treatment outcome expectations accordingly.
(Sorry for delay, and thanks for the formatting note.)
My knowledge is not very up to date re. machine medicine, but I did get to play with some of the commercially available systems, and I wasn’t hugely impressed. There may be a lot more impressive results yet to be released commercially but (appealing back to my priors) I think I would have heard of it as it would be a gamechanger for global health. Also, if fairly advanced knowledge work of primary care can be done by computer, I’d expect a lot of jobs without the protective features of medicine to be automated.
I agree that machine medicine along the lines you suggest will be superior to human performance, and I anticipate this to be achieved (even if I am right and it hasn’t already happened) fairly soon. I think medicine will survive less by the cognitive skill required, but rather though technical facility and social interactions, where machines comparably lag (of course, I anticipate they will steadily get better at this too).
I grant a hansonian account can accomodate this sort of ‘guided by efficacy’ data I suggest by ‘pretending to actually try’ considerations, but I would suggest this almost becomes an epicycle: any data which supports medicine being about healing can be explained away by the claim that they’re only pretending to be about healing as a circuitous route to signalling. I would say the general ethos of medicine (EBM, profileration of trials) looks like pro tanto reasons in favour about being about healing, and divergence from this (e.g. what happened to semmelweis, other lags) is better explained by doctors being imperfect and selfish, and patients irrational, rather than both parties adeptly following a signalling account.
But I struggle to see what evidence could neatly distinguish between these cases. If you have an idea, I’d be keen to hear it. :)
I agree with the selection worry re. Metamed’s customers: they also are assumedly selected from people who modern medicine didn’t help, which may also have some effects (not to mention making Metameds task harder, as their pool will be harder to treat than unselected-for-failure cases who see the doctor ‘first line’). I’d also (with all respect meant to the staff of Metamed) suggest staff of Metamed may not be the most objective sources of why it failed: I’d guess people would prefer to say their startups failed because of the market or product market fit, rather than ‘actually, our product was straight worse than our competitors’.
But I struggle to see what evidence could neatly distinguish between these cases. If you have an idea, I’d be keen to hear it. :)
I’m not sure there’s much of a difference between the “doctors care about healing, but run into imperfection and seflishness” interpretation and the “doctors optimize for signalling, but that requires some healing as a side effect” interpretation besides which piece goes before the ‘but’ and which piece goes after.
The main difference I do see is that if ‘selfishness’ means ‘status’ then we might see different defection than if ‘selfishness’ means ‘greed.’ I’m not sure there’s enough difference between them for a clear comparison to be made, though. Greedy doctors will push for patients to do costly but unnecessary procedures, but status-seeking doctors will also push for patients to do costly but unnecessary procedures because it makes them seem more important and necessary.
Regarding arguments that the allocation of medical resources, particularly in the U.S. are wasteful and harmful in many cases—I agree in general, though the specifics are messy, and I don’t find Robin’s posts on the matter very well argued*. I’m most interested in this bit:
This is statisticians and efficiency experts and so on trying to apply standard industrial techniques to medicine and getting pushback that looks ludicrous to me. For example, human diagnosticians perform at the level or worse than simple algorithms (I’m talking linear regressions, here, not even neural networks or decision trees or so on), and this has been known in the efficiency literature for well over fifty years
Particularly since your initial claim that had me raising eyebrows was that MetaMed failed because they have great diagnostics, but medicine doesn’t want good diagnostics.
Edit: *In the RAND post he argues that lower co-pays in a well insured population resulted in no marginal benefit of health (I’m unconvinced by this but I’d rather not go there), therefore the fact that most studies show a positive effect of medicine is a sham. I’m not sure if he thinks that statins and insulin are a scam but this is a bold and unjustified conclusion. The RAND experiment is not equipped to evaluate the overall healthcare effects of medicine, and that was not its main purpose—it was for examining healthcare utilization. The specific health effects of common interventions are known by studying them directly, and getting patients to follow the treatment protocols that get those results is, as far as I know, an unsolved problem.
Particularly since your initial claim that had me raising eyebrows was that MetaMed failed because they have great diagnostics, but medicine doesn’t want good diagnostics.
Ah, that’s a slightly broader claim than the one I wanted to make. MetaMed, especially early on, optimized for diagnostics and very little else, and so ran into problems like “why is the report I paid $5,000 for so poorly typeset?”. So it’s not that medicine / patients wants bad diagnostics ceteris paribus, but that the tradeoffs they make between the various features of medical care make it clear that healing isn’t the primary goal.
The RAND experiment is not equipped to evaluate the overall healthcare effects of medicine, and that was not its main purpose—it was for examining healthcare utilization.
As I understand it, the study measured health outcomes at the beginning and end of the study, as well as utilization during the study. The group with lower copays consumed much more medicine than the group with higher copays, but was no healthier. This suggests that the marginal bit of medicine—i.e. the piece that people don’t consume, but would if it were cheaper or do consume but wouldn’t if it were more expensive—doesn’t have a net impact. (Anything that it would do to help is countered by the risks of interacting with the medical system, say.)
I think I should also make it clear that there’s a difference between medicine, the attempt to heal people, and Medicine, the part of our economy devoted to such, just like there’s a distinction between science and Science. One could make a similar claim that Science Isn’t About Discovery, for example, which would seem strange if one is only thinking about “the attempt to gain knowledge” instead of the actual academia-government-industry-journal-conference system. Most of Robin’s work is on medical spending specifically, i.e. medicine as actually practiced instead of how it could be practiced.
“People evaluated this report solely using non-medical considerations” is not the same as “medical considerations aren’t the primary goal” in the way that is normally understood. The non-medical consdierations serve as a filter.
I want to read a book with a good story (let’s call that a good book). However, I don’t want to read a good book that will cost me $5000 to read. By your definition, that means that my primary goal is not to read a good book, my primary goal is to read a cheap enough book.
That is not how most people use the phrase “primary goal”.
The non-medical consdierations serve as a filter. … That is not how most people use the phrase “primary goal”.
Which suggests to me that those are the primary goal. Now, you might say “but most people are homo hypocritus, not homo economicus, so ‘primary goal’ should mean ‘stated goal’ instead of ‘actual goal’. And if that’s your reply, go back and reread all my posts replacing “primary goal” with “actual goal,” because the wording isn’t specific.
I want to read a book with a good story (let’s call that a good book). However, I don’t want to read a good book that will cost me $5000 to read. By your definition, that means that my primary goal is not to read a good book, my primary goal is to read a cheap enough book.
Your primary goal is your life satisfaction, and good books are only one way to achieve that; if you think you can get more out of $5k worth of spending in other areas than on books, this lines up with my model.
(I will note, though, that one can view many college classes as the equivalent of “spending $5000 to read a good book.”)
The relevant comparison is more like this one: suppose you preferred a poorly reviewed book with a superior cover to a well reviewed book with an inferior cover. Then we could sensibly conclude that you care more about the cover than the reviews, even if you verbally agree that reviews are more likely to be indicative of quality than the cover.
And if that’s your reply, go back and reread all my posts replacing “primary goal” with “actual goal,” because the wording isn’t specific.
It doesn’t make any more sense with that. Pretty much nobody would say that because they wouldn’t do Y if X wasn’t true, X is their actual goal for Y. Any term that they would use is such that substituting it in makes your original statement not very insightful.
(For instance, most people wouldn’t call X a primary goal or an actual goal, but they might call X a necessary condition. But if you were to say “people found something other than healing to be a necessary condition for buying a report”, that would not really say much that isn’t already obvious.)
The relevant comparison is more like this one: suppose you preferred a poorly reviewed book with a superior cover to a well reviewed book with an inferior cover.
I prefer a poorly reviewed book that costs $10 to a well reviewed book that costs $5000. By your reasoning I “care more about the price than about the reviews”.
(I will note, though, that one can view many college classes as the equivalent of “spending $5000 to read a good book.”)
Pretty much nobody would say that because they wouldn’t do Y if X wasn’t true, X is their actual goal for Y.
I think this reveals our fundamental disagreement: I am describing people, not repeating people’s self-descriptions, and since I am claiming that people are systematically mistaken about their self-descriptions, of course there should be a disagreement between them!
That is, suppose Alice “goes to restaurants for the food” but won’t go to any restaurants that have poor decor / ambiance, but will go to restaurants that have good ambiance and poor food. If Bob suggests to Alice that they go to a hole-in-the-wall restaurant with great food, and Alice doesn’t like it or doesn’t go, then an outside observer seems correct in saying that Alice’s actual goal is the ambiance.
Now, sure, Alice could be assessing the experience along many dimensions and summing them in some way. But typically there is a dominant feature that overrides other concerns, or the tradeoffs seem to heavily favor one dimension (perhaps there need to be five units of food quality increase to outweigh one unit of ambiance quality decrease), which cashes out to the same thing when there’s a restricted range.
I prefer a poorly reviewed book that costs $10 to a well reviewed book that costs $5000. By your reasoning I “care more about the price than about the reviews”.
I think you do care more about the price than about the reviews? That is, if there were a book that cost $5k and there were a bunch of people who had read it and said that the experience of reading it was life-changingly good and totally worth $5k, and you decided not to spend the money on the book, it’s clear that you’re not in the most hardcore set of story-chasers, but instead you’re a budget-conscious story-chaser.
To bring it back to MetaMed, oftentimes the work that they did was definitely worth the cost. People pay hundreds of thousands of dollars for treatment of serious conditions, and so the idea of paying five thousand dollars to get more diagnostic work done to make sure the other money is well-spent is not obviously a strange or bad idea, whereas paying $5k for a novel is outlandish.
That’s fighting the hypothetical.
I don’t see why you think that. You could argue it’s reference class tennis, but if your point is “people don’t do weird thing X” and in fact people do weird thing X in a slightly different context, then we need to reevaluate what is generating the weirdness. If people do actually spend thousands of dollars in order to read a book (and be credentialed for having read it), then a claim that you don’t want to spend for it becomes a statement about you instead of about people in general, or a statement about what features you find most relevant.
(I don’t know your educational history, but suppose I was having this conversation with an English major who voluntarily took college classes on reading books; clearly the class experience of discussing the book, or the pressure to read the book by Monday, is what they’re after in a deeper way than they were after reading the bookt. If they just cared about reading the book, they would just read the book.)
I am describing people, not repeating people’s self-descriptions, and since I am claiming that people are systematically mistaken about their self-descriptions, of course there should be a disagreement between them!
I’m complaining about your terminology. Terminology is about which meaning your words communicate. Being wrong about one’s self-description is about whether the meaning you intend to communicate by your words is accurate. These are not the same thing and you can easily get one of them wrong independently of the other.
I think you do care more about the price than about the reviews? That is...
The sentence after the “that is” is a nonstandard definition of “caring more about the price than about the reviews”.
That’s fighting the hypothetical.
I don’t see why you think that.
It’s fighting the hypothetical because the hypothetical is that I do not want to pay $5000 for a book. Pointing out that there are situations where people want to pay $5000 for a book disputes whether the situation laid out in the hypothetical actually happens. That’s fighting the hypothetical. Even if you’re correct, whether the situation described in the hypothetical can actually happen is irrelevant to the point the hypothetical is being used to make.
but if your point is “people don’t do weird thing X”
My point is not “people don’t do weird thing X”, my point is that people do not use the term X for the type of situation described in the hypothetical. A situation does not have to actually happen in order for people to use terms to describe it.
A good place to get started there is Epistemology and the Psychology of Human Judgment, summarized on LW by badger
Thanks, I’ll try to find the relevant parts.
This suggests that the marginal bit of medicine—i.e. the piece that people don’t consume, but would if it were cheaper or do consume but wouldn’t if it were more expensive—doesn’t have a net impact
I didn’t want to get in too in depth into this discussion, because I don’t actually disagree with the weak conclusion that a lot of people receive too much healthcare and that completely free healthcare is probably a bad idea. But Robin Hanson doesn’t stop there, he concludes that the rest of medicine is a sham and the fact that other studies show otherwise is a scandal. As to why I don’t buy this, the RAND experiment does not show that health outcomes do not improve. It shows that certain measured metrics do not show a statistically significant improvement on the whole population. In fact in the original paper, the risk of dying was decreased for the poor high risk group but not the entire population. Which brings up a more general problem—such a study is obviously going to be underpowered for any particular clinical question, and it isn’t capable of detecting benefits that lie outside of those metrics.
Three main sources. (But first the disclaimer About Isn’t About You seems relevant—that is, even if medicine is all a sham (which I don’t believe), participating in the medical system isn’t necessarily a black mark on you personally.)
First is Robin Hanson’s summary on the literature on health economics. The medicine tag on Robin’s blog has a lot, but a good place to start is probably Cut Medicine in Half and Medicine as Scandal followed by Farm and Pet Medicine and Dog vs. Cat Medicine. To summarize it shortly, it looks like medical spending is driven by demand effects (we care so we spend to show we care) rather than supply effects (medicine is better so we consume more) or efficacy (we don’t keep good records of how effective various doctors are). His proposal for how to fund medicine shows what he thinks a more sane system would look like. (As ‘cut medicine in half’ suggests, he doesn’t think the average medical spending has a non-positive effect, but that the marginal medical spending does, to a very deep degree.)
Second is the efficiency literature on medicine. This is statisticians and efficiency experts and so on trying to apply standard industrial techniques to medicine and getting pushback that looks ludicrous to me. For example, human diagnosticians perform at the level or worse than simple algorithms (I’m talking linear regressions, here, not even neural networks or decision trees or so on), and this has been known in the efficiency literature for well over fifty years. Only in rare cases does this actually get implemented in practice (for example, a flowchart for dealing with heart attacks in emergency rooms was popularized a few years back and seems to have had widespread acceptance). It’s kind of horrifying to realize that our society is smarter about, say, streamlining the production of cars than we are streamlining the production of health, especially given the truly horrifying scale of medical errors. Stories like Semmelweis and the difficulty getting doctors to wash their hands between patients further expand this view.
Third is from ‘the other side’; my father was a pastor and thus spent quite some time with dying people and their families. His experience, which is echoed by Yvain in Who By Very Slow Decay and seems to be the common opinion among end-of-life professionals in general, is that the person receiving end-of-life care generally doesn’t want it and would rather die in peace, and the people around them insist that they get it (mostly so that they don’t seem heartless). As Yvain puts it:
Once you really grok that a huge amount of medical spending is useless torture, and if you are familiar with what it looks like to design a system to achieve an end, it becomes impossible to see the point of our medical system as healing people.
[edit]And look at today’s Hanson post!
I broadly differ with the hansonian take on medicine. I think metamed failed not because it offered more effective healing but went bust because medicine doesn’t really demand healing; but rather that medicine is about healing, generally does this pretty well, and Metamed was unable to provide a significant edge in performance over standard medicine. (I should note I am a doctor, albeit a somewhat contrarian one. I wrote the 80k careers guide on medicine).
I think medicine is generally less fertile ground for hansonian signalling accounts, principally because health is so important for our life and happiness we’re less willing to sacrifice it to preserve face (I’d wager it is an even better tax on bs than money). If the efficacy of marginal health spending is near zero in rich countries, that seems evidence in support of, ‘medicine is really about healing’ - we want to live healthily so much we chase the returns curve all the way to zero!
There are all manner of ways in which western world medicine does badly, but I think sometimes the faults are overblown, and the remainder are best explained by human failings rather than medicine being a sham practice:
1) My understanding of the algorithms for diagnosis is that although linear regressions and simple methods can beat humans at very precise diagnostic questions (e.g. ’Given these factors of a patient who is mentally ill, what is their likelihood of committing suicide?), humans still have better performance in messier (and more realistic) situations. It’d be surprising for IBM to unleash Watson on a very particular aspect of medicine (therapeutic choice in oncology) if simple methods could beat doctors across most of the board.
(I’d be very interested to see primary sources if my conviction is mistaken)
2) Medicine has become steadily more and more protocolized, and clinical decision rules, standard operating procedures and standards of care are proliferating rapidly. I agree this should have happened sooner: that Atul Gwande’s surgical checklist happened within living memory is amazing, but it is catching on, and (mildly against hansonian explanations) has been propelled by better outcomes.
I can’t speak for the US, but there are clear protocols in the UK about initial emergency management of heart attacks. Indeed, take a gander at the UK’s ‘NICE Pathways’ which gives a flow chart on how to act in all circumstances where a heart attack is suspected.
3) I agree that the lack of efficacy information about individual doctors isn’t great. Reliable data on this is far from trivial to acquire however, and that with doctors understandable self-interest not to be too closely monitored seems to explain this lacuna as well as the hansonian story. (Patients tend to want to know this information if it is available, which doesn’t fit well with them colluding with their doctors and family in a medical ritual unconnected to their survival).
4) Over-treatment is rife, but the US is generally held up as an anti-examplar of this fault, and (at least judging by the anecdotes) medics in the UK are better (albeit still far from perfect) at flogging the patient to death with medical torture. Outside of this zero or negative margin, performance is better: it is unclear how much is attributable to medicine, but life expectancy, disease free life expectancy, and age-standardized mortality rates for most conditions are declining.
Now, why Metamed failed (I appreciate one should get basically no credit for predicting a start up will fail given this is the usual outcome, but I called it a long time ago):
Metamed’s business model relied on there being a lot of low hanging fruit to pluck. That in many cases, a diagnosis or treatment would elude the clinician because they weren’t appraised of the most recent evidence, were only able to deal in generalities rather than personalized recommendations, or that they just were less adept at synthesizing the evidence available.
If it were Metamed versus the average doctor—the one who spends next-to-no time reading academic papers, who is incredibly busy, stressed out, and so on, you’d be forgiven for thinking that metamed has an edge. However, medics (especially generalists) have long realized they have no hope of keeping abreast of a large medical literature on their own. Enter division of labour: they instead commission the relevant experts to survey, aggregate and summarize the current state of the evidence base, leaving them the simpler task of applying in their practice. To make sure it was up to date, they’d commission the experts to repeat this fairly often.
I mentioned NICE (National Institute of Clinical Excellence) earlier. They’re a body in the UK who are responsible (inter alia) for deciding when drugs and treatments get funded on the NHS. They spend a vast amount of time on evidence synthesis and meta-analysis. To see what sort of work this produces google ‘NICE {condition}’. An example for depression is here. Although I think the UK is world leading in this aspect, there are similar bodies in similar countries in other countries, as well as commercial organizations (e.g. Uptodate.)
Against this, Metamed never had any edge: they didn’t have groups of subject matter experts to call upon for each condition or treatment in question, nor (despite a lot of mathsy inclination amongst them) did they by and large have parity in terms of meta-analysis, evidence synthesis and related skills. They were also outmatched in terms of quantity of man hours that could be deployed, and the great headstart NICE et al. already had. When their website was still up I looked at some of their example reports, and my view was they were significantly inferior to what you could get via NICE (for free!) or Uptodate or similar services for their lower fees.
MEtamed might have had a hope if in the course of producing these general evidence summaries, a lot of fine-grained data was being aggregated out to produce something ‘one size fits all’ - their edge would be going back to the original data to find out that although generally drug X is good for a condition, in ones particular case in virtue of age, genotype, or whatever else, drug Y is superior.
However, this data by and large does not exist: much of medicine is still at the stage of working out whether something works generally, rather than delving into differential response and efficacy. It is not clear it ever will—humans might be sufficiently similar to one another that for almost all of them one treatment will be the best. The general success of increasing protocolization in medicine is some further weak evidence of this point.
I generally adduce meta-med as an example of rationalist overconfidence. That insurgent Bayesians can just trounce relevant professionals in terms of what they purport to do thanks to signalling etc. But again, given the expectation was for it to fail (as most start ups do), this doesn’t provide evidence. If it had succeeded, I’d have updated much more strongly in the magic of rationalism meaning you can win and the world being generally dysfunctional.
Formatting note: the brackets for links are greedy, so you need to escape them with a \ to avoid a long link.
[Testing] a long link
[Testing] a short link
I agree that I expect people to be more willing to trade money for face than health for face. I think the system is slanted too heavily towards face, though.
I should also point out that this is mostly a demand side problem. If it were only a supply side problem, MetaMed could have won, but it’s not—people are interested in face more than they’re interested in health (see the example of the outdated brochure that was missing the key medical information, but looked like how a medical brochure is supposed to look).
My understanding is that this is correct for the simple techniques, but incorrect for the complicated techniques. That is, you’re right that a single linear regression can’t replace a GP but a NLP engine plus a twenty questions bot plus a causal network probably could. (I unfortunately don’t have any primary sources at hand; medical diagnostics is an interest but most of the academic citations I know are all machine diagnostics, since that’s what my research was in.)
I should also mention that, from the ML side, the technical innovation of Watson is in the NLP engine. That is, a patient could type English into a keyboard and Watson would mostly understand what they’re saying, instead of needing a nurse or doctor to translate the English into the format needed by the diagnostic tool. The main challenge with uptake of the simple techniques historically was that they only did the final computation, but most of the work in diagnostics is collecting the information from the patient. And so if the physcian is 78% accurate and the linear regression is 80% accurate, is it really worth running the numbers for those extra 2%?
From a business standpoint, I think it’s obvious why IBM is moving slowly; just like with self-driving cars, the hard problems are primarily legal and social, not technical. Even if Watson has half the error rate of a normal doctor, the legal liability status is very different, just like a self-driving car that has half the error rate of a human driver would result in more lawsuits for the manufacturer, not less. As well, if the end goal is to replace doctors, the right way to do that is imperceptibly hand more and more work over to the machines, not to jump out of the gate with a “screw you, humans!”
So, just like the Hansonian view of Effective Altruism is that it replaces Pretending to Try not with Actually Trying but with Pretending to Actually Try, if there is sufficient pressure to pretend to care about outcomes then we should expect people to move towards better outcomes as their pretending has nonzero effort.
But I think you can look at the historical spread of anesthesia vs. the historical spread of antiseptics to get a sense of the relative importance of physician convenience and patient outcomes. (This is, I think, a point brought up by Gawande.)
I think I agree with your observations about MetaMed’s competition but not necessarily about your interpretation. That is, MetaMed could have easily failed for both the reasons that its competition was strong and that its customers weren’t willing to pay for its services. I put more weight on the latter because the experience that MetaMed reported was mostly not “X doesn’t want to pay $5k for what they can get for free from NICE” but “X agrees that this is worth $100k to them, but would like to only pay me $5k for it.” (This could easily be a selection effect issue, where everyone who would choose NICE instead is silent about it.)
This is why I’m most optimistic about machine medicine, because it basically means instead of going to a doctor (who is tired / stressed / went to medical school twenty years ago and only sort of keeps up) you go to the interactive NICE protocol bot, which asks you questions / looks at your SNPs and tracked weight/heart rate/steps/sleep/etc. data / calls in a nurse or technician to investigate a specific issue, diagnoses the issue and prescribes treatment, then follows up and adjusts its treatment outcome expectations accordingly.
(Sorry for delay, and thanks for the formatting note.)
My knowledge is not very up to date re. machine medicine, but I did get to play with some of the commercially available systems, and I wasn’t hugely impressed. There may be a lot more impressive results yet to be released commercially but (appealing back to my priors) I think I would have heard of it as it would be a gamechanger for global health. Also, if fairly advanced knowledge work of primary care can be done by computer, I’d expect a lot of jobs without the protective features of medicine to be automated.
I agree that machine medicine along the lines you suggest will be superior to human performance, and I anticipate this to be achieved (even if I am right and it hasn’t already happened) fairly soon. I think medicine will survive less by the cognitive skill required, but rather though technical facility and social interactions, where machines comparably lag (of course, I anticipate they will steadily get better at this too).
I grant a hansonian account can accomodate this sort of ‘guided by efficacy’ data I suggest by ‘pretending to actually try’ considerations, but I would suggest this almost becomes an epicycle: any data which supports medicine being about healing can be explained away by the claim that they’re only pretending to be about healing as a circuitous route to signalling. I would say the general ethos of medicine (EBM, profileration of trials) looks like pro tanto reasons in favour about being about healing, and divergence from this (e.g. what happened to semmelweis, other lags) is better explained by doctors being imperfect and selfish, and patients irrational, rather than both parties adeptly following a signalling account.
But I struggle to see what evidence could neatly distinguish between these cases. If you have an idea, I’d be keen to hear it. :)
I agree with the selection worry re. Metamed’s customers: they also are assumedly selected from people who modern medicine didn’t help, which may also have some effects (not to mention making Metameds task harder, as their pool will be harder to treat than unselected-for-failure cases who see the doctor ‘first line’). I’d also (with all respect meant to the staff of Metamed) suggest staff of Metamed may not be the most objective sources of why it failed: I’d guess people would prefer to say their startups failed because of the market or product market fit, rather than ‘actually, our product was straight worse than our competitors’.
I’m not sure there’s much of a difference between the “doctors care about healing, but run into imperfection and seflishness” interpretation and the “doctors optimize for signalling, but that requires some healing as a side effect” interpretation besides which piece goes before the ‘but’ and which piece goes after.
The main difference I do see is that if ‘selfishness’ means ‘status’ then we might see different defection than if ‘selfishness’ means ‘greed.’ I’m not sure there’s enough difference between them for a clear comparison to be made, though. Greedy doctors will push for patients to do costly but unnecessary procedures, but status-seeking doctors will also push for patients to do costly but unnecessary procedures because it makes them seem more important and necessary.
There’s also a metamed cofounder making the same case for their failure, here: https://thezvi.wordpress.com/2015/06/30/the-thing-and-the-symbolic-representation-of-the-thing/
Thanks for the detailed reply.
Regarding arguments that the allocation of medical resources, particularly in the U.S. are wasteful and harmful in many cases—I agree in general, though the specifics are messy, and I don’t find Robin’s posts on the matter very well argued*. I’m most interested in this bit:
Particularly since your initial claim that had me raising eyebrows was that MetaMed failed because they have great diagnostics, but medicine doesn’t want good diagnostics.
Edit: *In the RAND post he argues that lower co-pays in a well insured population resulted in no marginal benefit of health (I’m unconvinced by this but I’d rather not go there), therefore the fact that most studies show a positive effect of medicine is a sham. I’m not sure if he thinks that statins and insulin are a scam but this is a bold and unjustified conclusion. The RAND experiment is not equipped to evaluate the overall healthcare effects of medicine, and that was not its main purpose—it was for examining healthcare utilization. The specific health effects of common interventions are known by studying them directly, and getting patients to follow the treatment protocols that get those results is, as far as I know, an unsolved problem.
There’s also a metamed cofounder making the same case, here: https://thezvi.wordpress.com/2015/06/30/the-thing-and-the-symbolic-representation-of-the-thing/
A good place to get started there is Epistemology and the Psychology of Human Judgment, summarized on LW by badger.
Ah, that’s a slightly broader claim than the one I wanted to make. MetaMed, especially early on, optimized for diagnostics and very little else, and so ran into problems like “why is the report I paid $5,000 for so poorly typeset?”. So it’s not that medicine / patients wants bad diagnostics ceteris paribus, but that the tradeoffs they make between the various features of medical care make it clear that healing isn’t the primary goal.
As I understand it, the study measured health outcomes at the beginning and end of the study, as well as utilization during the study. The group with lower copays consumed much more medicine than the group with higher copays, but was no healthier. This suggests that the marginal bit of medicine—i.e. the piece that people don’t consume, but would if it were cheaper or do consume but wouldn’t if it were more expensive—doesn’t have a net impact. (Anything that it would do to help is countered by the risks of interacting with the medical system, say.)
I think I should also make it clear that there’s a difference between medicine, the attempt to heal people, and Medicine, the part of our economy devoted to such, just like there’s a distinction between science and Science. One could make a similar claim that Science Isn’t About Discovery, for example, which would seem strange if one is only thinking about “the attempt to gain knowledge” instead of the actual academia-government-industry-journal-conference system. Most of Robin’s work is on medical spending specifically, i.e. medicine as actually practiced instead of how it could be practiced.
“People evaluated this report solely using non-medical considerations” is not the same as “medical considerations aren’t the primary goal” in the way that is normally understood. The non-medical consdierations serve as a filter.
I want to read a book with a good story (let’s call that a good book). However, I don’t want to read a good book that will cost me $5000 to read. By your definition, that means that my primary goal is not to read a good book, my primary goal is to read a cheap enough book.
That is not how most people use the phrase “primary goal”.
Which suggests to me that those are the primary goal. Now, you might say “but most people are homo hypocritus, not homo economicus, so ‘primary goal’ should mean ‘stated goal’ instead of ‘actual goal’. And if that’s your reply, go back and reread all my posts replacing “primary goal” with “actual goal,” because the wording isn’t specific.
Your primary goal is your life satisfaction, and good books are only one way to achieve that; if you think you can get more out of $5k worth of spending in other areas than on books, this lines up with my model.
(I will note, though, that one can view many college classes as the equivalent of “spending $5000 to read a good book.”)
The relevant comparison is more like this one: suppose you preferred a poorly reviewed book with a superior cover to a well reviewed book with an inferior cover. Then we could sensibly conclude that you care more about the cover than the reviews, even if you verbally agree that reviews are more likely to be indicative of quality than the cover.
It doesn’t make any more sense with that. Pretty much nobody would say that because they wouldn’t do Y if X wasn’t true, X is their actual goal for Y. Any term that they would use is such that substituting it in makes your original statement not very insightful.
(For instance, most people wouldn’t call X a primary goal or an actual goal, but they might call X a necessary condition. But if you were to say “people found something other than healing to be a necessary condition for buying a report”, that would not really say much that isn’t already obvious.)
I prefer a poorly reviewed book that costs $10 to a well reviewed book that costs $5000. By your reasoning I “care more about the price than about the reviews”.
That’s fighting the hypothetical.
I think this reveals our fundamental disagreement: I am describing people, not repeating people’s self-descriptions, and since I am claiming that people are systematically mistaken about their self-descriptions, of course there should be a disagreement between them!
That is, suppose Alice “goes to restaurants for the food” but won’t go to any restaurants that have poor decor / ambiance, but will go to restaurants that have good ambiance and poor food. If Bob suggests to Alice that they go to a hole-in-the-wall restaurant with great food, and Alice doesn’t like it or doesn’t go, then an outside observer seems correct in saying that Alice’s actual goal is the ambiance.
Now, sure, Alice could be assessing the experience along many dimensions and summing them in some way. But typically there is a dominant feature that overrides other concerns, or the tradeoffs seem to heavily favor one dimension (perhaps there need to be five units of food quality increase to outweigh one unit of ambiance quality decrease), which cashes out to the same thing when there’s a restricted range.
I think you do care more about the price than about the reviews? That is, if there were a book that cost $5k and there were a bunch of people who had read it and said that the experience of reading it was life-changingly good and totally worth $5k, and you decided not to spend the money on the book, it’s clear that you’re not in the most hardcore set of story-chasers, but instead you’re a budget-conscious story-chaser.
To bring it back to MetaMed, oftentimes the work that they did was definitely worth the cost. People pay hundreds of thousands of dollars for treatment of serious conditions, and so the idea of paying five thousand dollars to get more diagnostic work done to make sure the other money is well-spent is not obviously a strange or bad idea, whereas paying $5k for a novel is outlandish.
I don’t see why you think that. You could argue it’s reference class tennis, but if your point is “people don’t do weird thing X” and in fact people do weird thing X in a slightly different context, then we need to reevaluate what is generating the weirdness. If people do actually spend thousands of dollars in order to read a book (and be credentialed for having read it), then a claim that you don’t want to spend for it becomes a statement about you instead of about people in general, or a statement about what features you find most relevant.
(I don’t know your educational history, but suppose I was having this conversation with an English major who voluntarily took college classes on reading books; clearly the class experience of discussing the book, or the pressure to read the book by Monday, is what they’re after in a deeper way than they were after reading the bookt. If they just cared about reading the book, they would just read the book.)
I’m complaining about your terminology. Terminology is about which meaning your words communicate. Being wrong about one’s self-description is about whether the meaning you intend to communicate by your words is accurate. These are not the same thing and you can easily get one of them wrong independently of the other.
The sentence after the “that is” is a nonstandard definition of “caring more about the price than about the reviews”.
It’s fighting the hypothetical because the hypothetical is that I do not want to pay $5000 for a book. Pointing out that there are situations where people want to pay $5000 for a book disputes whether the situation laid out in the hypothetical actually happens. That’s fighting the hypothetical. Even if you’re correct, whether the situation described in the hypothetical can actually happen is irrelevant to the point the hypothetical is being used to make.
My point is not “people don’t do weird thing X”, my point is that people do not use the term X for the type of situation described in the hypothetical. A situation does not have to actually happen in order for people to use terms to describe it.
Thanks, I’ll try to find the relevant parts.
I didn’t want to get in too in depth into this discussion, because I don’t actually disagree with the weak conclusion that a lot of people receive too much healthcare and that completely free healthcare is probably a bad idea. But Robin Hanson doesn’t stop there, he concludes that the rest of medicine is a sham and the fact that other studies show otherwise is a scandal. As to why I don’t buy this, the RAND experiment does not show that health outcomes do not improve. It shows that certain measured metrics do not show a statistically significant improvement on the whole population. In fact in the original paper, the risk of dying was decreased for the poor high risk group but not the entire population. Which brings up a more general problem—such a study is obviously going to be underpowered for any particular clinical question, and it isn’t capable of detecting benefits that lie outside of those metrics.