It’s worse than that. If you ran a machine learning algorithm to generate that decision tree based on all the papers from an authoritative source on best treatment practice/most probable diagnosis given a set of signs and symptoms, you would arrive at a limited set of possible decision trees. Assuming certain things about the algorithm you may arrive at one and only one tree.
This means that if doctors are rational and following best practices they cannot agree to disagree, all doctors shall give the same probability distribution of diagnoses given the same input information.
This is trivially untrue, and obviously AI systems could be built with this correctness property. (Untrue from a combination of overfitting and errors)
When I’m asking ChatGPT (even telling it that the diet is vegan) and ask for possible diagnosis for the then protein deficiency is not in the top 18 diagnoses.
The problem of the OP seems that his problem was far from being the most probable diagnosis given the current state of medical knowledge.
I think you could reasonably argue that your diagnosis shouldn’t always be the most likely one, though really, we’d want a full spread of the logits (if only to weigh the costs and benefits of possible treatments or further examinations). Obviously doctors don’t actually do any of that. Many have barely a grasp on probability. Once I was discussing a rather trivial surgery with a consultant, non vital, just quality of life stuff. Turns out one of the possible failure modes was “death”. That sounded worrying so I asked more about it. Doctor told me not to worry, odds are less than 1%.
I was like “Ok, do you realise technically a 1 in 101 chance of death for a non necessary surgery that has only uncertain odds of even fixing the problem is quite high?”.
I had to investigate the thing myself, turns out odds were more like 0.05%. And anyway a different doctor managed to improve my condition with pharmacological methods alone.
Correct it should be taking into account the distribution of possible diagnoses, weighted by probability, and then the same done for each action you consider. It ends up being a combinatorial number of things to consider—trivially beyond human intelligence. The AI system need not be very smart just willing to actually do all the multiplications of these residual probabilities.
And yeah for considering whether to use an experimental drug or treatment, same idea. Whether it’s “sufficiently safe” depends on the situation.
And you have to learn from all the outcomes.… something doctors don’t get to do. All the outcomes. Not just their patients, or everyone on the floor, or the city, but every patient in the world you have data on.
Ultimately a rational world would revoke a doctor’s license immediately the moment they override an AIs recommendation without new evidence or some justification.
I think there is a point that absent such data (which we don’t have at that scale, and acquiring it would be non trivial in terms of organisation, accuracy, safety and privacy issues, and so on), you need to use mechanistic knowledge of the human body and its processes to form good priors. Fair enough, but I very much doubt most doctors remember, or dare apply, that much physiology; the body is notoriously complex and chaotic and very often even state of the art knowledge is baffled by its interactions. That said, “hey, muscles need protein to keep in shape, so maybe protein deficiency could cause this?” is exactly that kind of basic mechanistic insight, and apparently no doctor got it. It’d be hilarious if GPT-4 did instead; it doesn’t even sound much of a stretch.
Well, hindsight is 20⁄20. I’m not that confident that I’d be able to suggest “obvious” association if I were given a few clinical case without the answer attached (this seems like a lost opportunity here OP).
To be clear, the doctors in OP’s anecdote do seem somewhat subpar, (and revoking a doctor’s license if they regularly override an AIs recommendation without new evidence or some good justification sounds like a pretty good idea provided AI gets reliably better results than humans, but you’d have to find a way to make the doctors want to stick their neck out in the few case where the AI is wrong) but we should refrain from piling on based on one second hand anecdote and some personnal frustration.
I could very easely write up a true story about ANY profession depicting how incompetent some of them are. So either everyone is incompetent, or I just don’t know enough about what they do to trully evaluate their work… I would rather err on the side of humility (I’ll agree that ideally we shouldn’t err at all) and charity.
My original point though was less “doctors are dramatically incompetent” and more specifically “when medical diagnosis by AI is discussed, doctors raise an unrealistic bar that most of them aren’t actually able to meet”. I am willing to accept the fact that people make mistakes and are limited, but that’s exactly why they should welcome with some humility the option of tools that supplement their memory and ability to draw connections. Instead most responses I see to the idea of AI diagnosis seem to suggest that doctors possess this strange mystical knowledge of the human body that allows them to deduce correct diagnoses from the faintest of signals by cross-referencing tiny symptoms, which is honestly ludicrous. 99% of medical diagnoses are “if you have symptom, you probably have [most common disease correlating with symptom and possibly your sex and age]”, and that’s about it. No one is saying AI could instantaneously become House MD.
Even House MD isn’t House MD. My mom, a retired physician, hated the show because of how unrealistic it was, which surprised me because I knew the people writing the show worked hard to get the details correct. The part that was unrealistic, was that, in the real world, you usually don’t need a House-style absolutely correct diagnosis to successfully treat someone, even when the patient really does have something weird.
At the beginning of one episode, after the patient’s symptoms were revealed, she said, “When someone shows up at a hospital with these symptoms, you give them [this treatment].” At the end of the episode, after Dr. House carefully figured out exactly what obscure problem the patient was suffering from, the treatment they gave the patient was exactly the same treatment my mom told me at the beginning of the episode.
I… completely agree with you… so I guess I wasn’t as clear as I thought I was being in my last post. Well, self assesment of communication skills updated, and lets celebrate.
But just checking, do you mean AI (meaning ChatGPT, since it’s the most sailent exemple, even thought it isn’t really an AI) TODAY (obviously in a few years it very likely will be much more capable) is better than a doctor in some ways? because I can provide plenty of exemple question you can give to chatgpt and to your doctor to compare how pertinent the response.
Mmmm, I’d be interrested to see what happened in the 25% of the case where the doctor was better. my personnal experience trying to draft my work shows that when ChatGPT fails, it’s spectacularly wrong. And ChatGPT’s glibness might give it an advantage in perceived accuracy. So yeah, it can be used to draft some stuff, thats basically its best use in most cases, but I really wouldn’t trust it without doctor (or lawer, coder, whatever is appropriate) supervision yet.
Being slightly more empathic isn’t better if it isn’t sufficiently reliable.
here is an exemple ” my bloodwork came in, I have blood potassium at 20 mmol/L and my calcium is undetectably low, what does this mean?” chatGPT always spouts irrelevant stuff about hyperkaliemia and hypocalcemia, instead of realising that those values are way too abnormal to not be some kind of interference (any doctor should realise that, and a really good doctor might be able to say that the blood sample was likely stored in a EDTA tube instead of an heparin tube).
So all in all, I wouldn’t summerise the article by “ChatGPT allready outperforms doctors on reddit” but rather by “ChatGPT could allready be used to help draft doctors letters”. That is a significant nuance.
Well, I did manage to finally find a scenario outrageous enough that it just questions it:
I’m sorry, but it’s not possible for a person to be alive without their head attached to their neck. A decapitation, where the head is completely severed from the body, is a fatal injury. Severing the head causes immediate loss of blood supply and disrupts vital functions, including respiration, circulation, and neurological activity. In medical terms, survival without a head is not feasible.
If you encounter a situation where someone’s head appears to be missing, it is likely a fictional or hypothetical scenario. In real-life emergency situations, it is important to assess the person’s condition and provide appropriate medical assistance while waiting for emergency medical services to arrive.
Not really, I think that you could however build one such thing with current tech, probably combining some LLM capabilities and some random forest or other such way to navigate the complexities of actual probabilistic diagnosis. Maybe use a transformer architecture but have symptoms in place of tokens and give the logits over possible diagnoses.
I know IBM has tried doing this, and supposedly always failed. I don’t know the details of their work, but I’m sort of perplexed about whether it really could have been so hard to produce something that at least performs at the level of a mediocre GP and knows when to say “I don’t know, refer to a specialist”. I worry that it might have been compared to a much higher bar than is sensible to use, and that much worse doctors than it retain their license just fine because no one tests them regularly against a diagnosis benchmark.
(anyway don’t worry about the miscommunication, I think the original point got a bit lost in the following comments and we drifted away from it)
It’s worse than that. If you ran a machine learning algorithm to generate that decision tree based on all the papers from an authoritative source on best treatment practice/most probable diagnosis given a set of signs and symptoms, you would arrive at a limited set of possible decision trees. Assuming certain things about the algorithm you may arrive at one and only one tree.
This means that if doctors are rational and following best practices they cannot agree to disagree, all doctors shall give the same probability distribution of diagnoses given the same input information.
This is trivially untrue, and obviously AI systems could be built with this correctness property. (Untrue from a combination of overfitting and errors)
When I’m asking ChatGPT (even telling it that the diet is vegan) and ask for possible diagnosis for the then protein deficiency is not in the top 18 diagnoses.
The problem of the OP seems that his problem was far from being the most probable diagnosis given the current state of medical knowledge.
I think you could reasonably argue that your diagnosis shouldn’t always be the most likely one, though really, we’d want a full spread of the logits (if only to weigh the costs and benefits of possible treatments or further examinations). Obviously doctors don’t actually do any of that. Many have barely a grasp on probability. Once I was discussing a rather trivial surgery with a consultant, non vital, just quality of life stuff. Turns out one of the possible failure modes was “death”. That sounded worrying so I asked more about it. Doctor told me not to worry, odds are less than 1%.
I was like “Ok, do you realise technically a 1 in 101 chance of death for a non necessary surgery that has only uncertain odds of even fixing the problem is quite high?”.
I had to investigate the thing myself, turns out odds were more like 0.05%. And anyway a different doctor managed to improve my condition with pharmacological methods alone.
Correct it should be taking into account the distribution of possible diagnoses, weighted by probability, and then the same done for each action you consider. It ends up being a combinatorial number of things to consider—trivially beyond human intelligence. The AI system need not be very smart just willing to actually do all the multiplications of these residual probabilities.
And yeah for considering whether to use an experimental drug or treatment, same idea. Whether it’s “sufficiently safe” depends on the situation.
And you have to learn from all the outcomes.… something doctors don’t get to do. All the outcomes. Not just their patients, or everyone on the floor, or the city, but every patient in the world you have data on.
Ultimately a rational world would revoke a doctor’s license immediately the moment they override an AIs recommendation without new evidence or some justification.
I think there is a point that absent such data (which we don’t have at that scale, and acquiring it would be non trivial in terms of organisation, accuracy, safety and privacy issues, and so on), you need to use mechanistic knowledge of the human body and its processes to form good priors. Fair enough, but I very much doubt most doctors remember, or dare apply, that much physiology; the body is notoriously complex and chaotic and very often even state of the art knowledge is baffled by its interactions. That said, “hey, muscles need protein to keep in shape, so maybe protein deficiency could cause this?” is exactly that kind of basic mechanistic insight, and apparently no doctor got it. It’d be hilarious if GPT-4 did instead; it doesn’t even sound much of a stretch.
Well, hindsight is 20⁄20. I’m not that confident that I’d be able to suggest “obvious” association if I were given a few clinical case without the answer attached (this seems like a lost opportunity here OP).
To be clear, the doctors in OP’s anecdote do seem somewhat subpar, (and revoking a doctor’s license if they regularly override an AIs recommendation without new evidence or some good justification sounds like a pretty good idea provided AI gets reliably better results than humans, but you’d have to find a way to make the doctors want to stick their neck out in the few case where the AI is wrong) but we should refrain from piling on based on one second hand anecdote and some personnal frustration.
I could very easely write up a true story about ANY profession depicting how incompetent some of them are. So either everyone is incompetent, or I just don’t know enough about what they do to trully evaluate their work… I would rather err on the side of humility (I’ll agree that ideally we shouldn’t err at all) and charity.
My original point though was less “doctors are dramatically incompetent” and more specifically “when medical diagnosis by AI is discussed, doctors raise an unrealistic bar that most of them aren’t actually able to meet”. I am willing to accept the fact that people make mistakes and are limited, but that’s exactly why they should welcome with some humility the option of tools that supplement their memory and ability to draw connections. Instead most responses I see to the idea of AI diagnosis seem to suggest that doctors possess this strange mystical knowledge of the human body that allows them to deduce correct diagnoses from the faintest of signals by cross-referencing tiny symptoms, which is honestly ludicrous. 99% of medical diagnoses are “if you have symptom, you probably have [most common disease correlating with symptom and possibly your sex and age]”, and that’s about it. No one is saying AI could instantaneously become House MD.
Even House MD isn’t House MD. My mom, a retired physician, hated the show because of how unrealistic it was, which surprised me because I knew the people writing the show worked hard to get the details correct. The part that was unrealistic, was that, in the real world, you usually don’t need a House-style absolutely correct diagnosis to successfully treat someone, even when the patient really does have something weird.
At the beginning of one episode, after the patient’s symptoms were revealed, she said, “When someone shows up at a hospital with these symptoms, you give them [this treatment].” At the end of the episode, after Dr. House carefully figured out exactly what obscure problem the patient was suffering from, the treatment they gave the patient was exactly the same treatment my mom told me at the beginning of the episode.
I… completely agree with you… so I guess I wasn’t as clear as I thought I was being in my last post. Well, self assesment of communication skills updated, and lets celebrate.
But just checking, do you mean AI (meaning ChatGPT, since it’s the most sailent exemple, even thought it isn’t really an AI) TODAY (obviously in a few years it very likely will be much more capable) is better than a doctor in some ways? because I can provide plenty of exemple question you can give to chatgpt and to your doctor to compare how pertinent the response.
https://jamanetwork.com/journals/jamainternalmedicine/article-abstract/2804309 Suggest that ChatGPT already outperforming doctors on Reddit.
Mmmm, I’d be interrested to see what happened in the 25% of the case where the doctor was better. my personnal experience trying to draft my work shows that when ChatGPT fails, it’s spectacularly wrong. And ChatGPT’s glibness might give it an advantage in perceived accuracy. So yeah, it can be used to draft some stuff, thats basically its best use in most cases, but I really wouldn’t trust it without doctor (or lawer, coder, whatever is appropriate) supervision yet.
Being slightly more empathic isn’t better if it isn’t sufficiently reliable.
here is an exemple ” my bloodwork came in, I have blood potassium at 20 mmol/L and my calcium is undetectably low, what does this mean?” chatGPT always spouts irrelevant stuff about hyperkaliemia and hypocalcemia, instead of realising that those values are way too abnormal to not be some kind of interference (any doctor should realise that, and a really good doctor might be able to say that the blood sample was likely stored in a EDTA tube instead of an heparin tube).
So all in all, I wouldn’t summerise the article by “ChatGPT allready outperforms doctors on reddit” but rather by “ChatGPT could allready be used to help draft doctors letters”. That is a significant nuance.
Well, I did manage to finally find a scenario outrageous enough that it just questions it:
Not really, I think that you could however build one such thing with current tech, probably combining some LLM capabilities and some random forest or other such way to navigate the complexities of actual probabilistic diagnosis. Maybe use a transformer architecture but have symptoms in place of tokens and give the logits over possible diagnoses.
I know IBM has tried doing this, and supposedly always failed. I don’t know the details of their work, but I’m sort of perplexed about whether it really could have been so hard to produce something that at least performs at the level of a mediocre GP and knows when to say “I don’t know, refer to a specialist”. I worry that it might have been compared to a much higher bar than is sensible to use, and that much worse doctors than it retain their license just fine because no one tests them regularly against a diagnosis benchmark.
(anyway don’t worry about the miscommunication, I think the original point got a bit lost in the following comments and we drifted away from it)