I disagree with this take and with the linked article. “The value of information is always positive” is brushing over the actual problem described by the healthcare studies: that taking a measurement is not guaranteed to accurately capture the world state, because a sensor can be faulty, and it is not always possible to distinguish a faulty sensor from a reliable sensor.
Me: If I went to talk to him, he’d probably lie. And probably it would be impossible to check his story without spending huge amounts of time and exposing myself to danger. But I’d feel obligated to do it anyway, and while I was distracted, the true criminal would get away. That risk outweighs the chance that he’d give me something useful.
This reasoning is claimed to be incorrect in the linked article, and further clarified in the conclusion
It’s a fact that if you make decisions correctly, then putting more information into the system can’t hurt you.
Consider sensors in a control system. We only add additional sensors to the controller if we can guarantee some level of quality from the sensor. If we can’t guarantee the sensors are valid, then each additional sensor added to the system might not be adding information at all—“garbage in, garbage out”.
The article is making the claim that a rational agent has a reliable function F(information) -> garbage? but that’s ridiculous. “How do I tell my sensors are working correctly?” is one of the hardest problems in control theory. The solution used in system design is multiple, independent measurements that can all be assessed together.
The claim in the article is that doing these tests is just that. Each test is an additional, independent measurement, that can be assessed against the base rate, and the other risk factors, or symptoms. That is technically true except it then glosses over the problems
Some of the above reasons to be careful about testing are fine. By all means, account for the costs of the CT scan itself (#1). And I’ll wearily pretend to accept that people are emotional and couldn’t understand Bayesian reasoning or false positives and so we need to worry about stressing them out (#2).
...
If you know that a patient’s prior probability for a condition is low, you still know that after doing a test. In a sane world, wouldn’t you do the CT scan, and then… only do the biopsy only if the CT scan showed something serious enough to justify the risks?
Look at one of the quoted studies.
This led to a PET scan that showed no small nodules but confirmed the lesion. Doctors considered surgery but decided against it because the lesion seemed to be growing too fast to be lung cancer. One month later, the lesion had shrunk, suggesting it was just some kind of inflammation or infection.
Bold is my own emphasis. Let’s flip a coin and look at another world state, a world state that did not occur for this patient, but has occurred for others.
This led to a PET scan that showed no small nodules but confirmed the lesion. Doctors responded rapidly with surgery to remove the lesion. A post-surgery biopsy revealed that the lesion was not cancerous. Unfortunately, the patient was one of the 3% of people who die within 90 days of lung surgery.[1]
In the real world, there is no reliable function F(information) -> garbage?. The idea that sometimes a test returns a false positive and it is “obvious” to the doctors that it is a false positive is incorrect. What do you want the doctors to do? Run the test, when it wasn’t likely that the patient has cancer (they have no other symptoms), see something that looks cancerous on the test (the false positive) and then do nothing? The conclusion here seems to be “obviously the doctor will realize it was a false positive and simply not operate”, which is ignoring the corpus of evidence in the linked studies showing that the doctors couldn’t distinguish between false positives![2]
In other words, if you have a threshold for action that is “patients with cancer have these symptoms and also a mass on a CT scan”, but you have an arbitrary shortcut like “only do the biopsy only if the CT scan showed something serious enough to justify the risks” (quote from the article), then now you’ve tied your dangerous action (the surgery) to the thing that we know has a false positive rate—the test!
The solution offered in the article is “well don’t do the biopsy unless they also have the symptoms”. This is the “multiple, independent measurements that are assessed together” approach. Except that if they don’t have symptoms, and we’ve decided that symptoms are a prerequisite for the biopsy, then there’s no reason to do the CT scan, which is exactly what the doctors concluded in the studies that are being criticized here.
Actual percent not relevant, so long as the surgery is risky, e.g. above 0.5% mortality rate. I grabbed this 3% number from various articles like this.
It is weird to me that I need to say this, but when we discuss false positives on sensors, there’s for some reason an assumption that within the context of a system, we “know” that we measured a false positive. In general, the system is not aware of a false positive, that’s why false positives are a problem. The only way to “know” that a sensor returned a false positive is if you have other, independent measurements that you can use to do some type of out-of-family filtering.
that taking a measurement is not guaranteed to accurately capture the world state, because a sensor can be faulty, and it is not always possible to distinguish a faulty sensor from a reliable sensor.
No. Dynomight already addressed this: if you have an unreliable sensor (ie. any sensor that has ever existed in the real world), then that simply reduces how useful it is, because it changes your posterior less than a more reliable one would. The VoI remains positive; I refer you to Ramsey and Savage on this particular point of decision theory.
All of your additional comments are generally wrong, and reflect an extremely rigid absolutist approach to making decisions. We use unreliable correlated measures all the time, this is in fact ‘technically true’ and that is is the point, and yes, your entire example of doctors is simply due to irrationality and does not refute the decision theory point and it has nothing to do with ‘being obvious it is a false positive’ except in the trivial sense that for a poor measure the posterior of a true positive remains far smaller than it being a false positive and may not motivate a decision, shrinking the VoI towards zero, which will frequently be so small as to not justify the cost of testing (explicitly pointed out by Dynomight). It is definitely the case that many tests cost too much for too little information and should not be run because the VoI is often zero (for a rational decision maker) and the test is simply a loss as it will not change any decisions. Nevertheless, the value of free information is always greater than or equal to zero, and if free information makes you worse off, that implies somewhere there is an irrationality.
(The really relevant problem with this in the context of medicine is that decision theory is considering single agents in a stochastic environment, an idealized physician ordering tests to try to optimize patient health, because game theory hasn’t been invented yet; when you bring in multiple agents with different goals and mechanisms like lawsuits, then free information can be quite harmful, but this too is not lost on most people, including Dynomight at the end.)
NOTE: I wrote this as a separate reply because it’s addressing your points about decision theory directly, and is not about the specific scenario discussed with the medical system.
if you have an unreliable sensor (ie. any sensor that has ever existed in the real world), then that simply reduces how useful it is, because it changes your posterior less than a more reliable one would.
I think the crux here is that you seem to be saying the usefulness of reading a sensor’s value is in some interval [0, 1], where 1 represents that the value provided by the sensor is perfectly trustworthy and 0 is that the value provided by the sensor is totally useless; i.e., it’s random noise. Under this belief, you’re saying that it is always rational to acquire as many sensors as possible, because there is no downside to acquiring useless sensors. When you run your filter over all of the sensors, anything that has a usefulness of 0 is going to get dropped from the final result. Likewise, low-but-non-zero usefulness sensors are weighted accordingly in the final result.
In my work, this is called sensor fusion. So far, so good.
I can argue that acquiring each sensor has a cost associated with it, but it seems like the idea of “free information” is intended to deflect that argument. Let us assume that the sensors are provided for free, and it’s just a question of “given an arbitrary number of sensors, with different usefulness, how many do you want to fuse when trying to model the correct world state?”
I think what you’ve said above implies that a rational actor should always want more sensors.
Nevertheless, the value of free information is always greater than or equal to zero, and if free information makes you worse off, that implies somewhere there is an irrationality.
More sensors leads to more sensor values (“information”), and the rational actor will simply use the usefulness of each sensor (which for the sake of argument we’ll assume that they know exactly) when weighting each sensor value.
In the real world, I still disagree with this claim. Computational complexity[1] exists. There is a cost to interpreting, and fusing, an arbitrary number of sensor values. Each additional sensor, even if it was provided by free, is going to incur an actual cost in computation before that value can be used to make a decision. A rational actor would not accept an arbitrary number of useless sensors if it is going to take non-zero computational cycles to disregard them.
When you include the cost of computation, now the value of those sensors is in some interval [0 - c, 1 - c], where c is how much it costs in computational effort[2] to include the sensor in your filter. In this world, sensors can have less than zero usefulness, i.e. it is actively detrimental to include the sensor in your filter. Your filter functions worse with that sensor than it does without it.
I believe the only way out of this is to ignore computational complexity and assume that c = 0, but we know that isn’t true. Consider the trivial thought experiment of me sitting here and providing you a series of useless facts about a fictional D&D campaign I’m running like, “A miraksaur is a type of dinosaur native to the planet Eurid.”, except the facts never stop. How rational would it be for you to keep trying to enter each additional value into your world state? They’re totally irrelevant, but if we ignore computational costs, there’s no downside to doing so. The reason why you should be wise to tune me out in that scenario is because c is definitely greater than 0.
Note that c is only fixed per value in the case where the algorithm for fusing information has linear time complexity O(N). We often use something like an extended Kalman filter (EKF) for sensor fusion. In that scenario, each additional value incurs an increasingly higher cost of computational effort to include it, so sensors with low usefulness are especially penalized. If I recall correctly, it is O(N^2). It’ll get to a point where it doesn’t matter how useful a sensor is, it would be irrational to try and include it because it’ll be prohibitively expensive to run the full computation.
If you’re worried about computational complexity, that’s OK. It’s not something that I mentioned because (surprisingly enough...) this isn’t something that any of the doctors discussed. If you like, let’s call that a “valid cost” just like the medical risks and financial/time costs of doing tests. The central issue is if it’s valid to worry about information causing harmful downstream medical decisions.
I’m sorry, but I just feel like we’ve moved the goal posts then.
I don’t see a lot of value in trying to disentangle the concept of information from 1.) costs to acquire that information, and 2.) costs to use that information, just to make some type of argument that a certain class of actor is behaving irrationally.
It starts to feel like “assume a spherical cow”, but we’re applying that simplification to the definition of what it means to be rational. First, it isn’t free to acquire information. But second, even if I assume for the sake of argument that the information is free, it still isn’t free to use it, because computation has costs.
if a theory of rational decision making doesn’t include that fact, it’ll come to conclusions that I think are absurd, like the idea that the most rational thing someone can do is acquire literally all available information before making any decision.
yes, your entire example of doctors is simply due to irrationality
So first you say this.
But then you start to backtrack
in the trivial sense that for a poor measure the posterior of a true positive remains far smaller than it being a false positive and may not motivate a decision, shrinking the VoI towards zero, which will frequently be so small as to not justify the cost of testing
And further admit
It is definitely the case that many tests cost too much for too little information and should not be run because the VoI is often zero (for a rational decision maker) and the test is simply a loss as it will not change any decisions.
But then you try to defend the initial claim, that the doctors are being irrational
Nevertheless, the value of free information is always greater than or equal to zero, and if free information makes you worse off, that implies somewhere there is an irrationality.
But we’ve already established that the tests are not free in the world we live in.
If you’re going to prove the doctors are being irrational in the world we live in, then you can’t change a core part of the problem statement. The tests do have costs—in time, in money, in available machines, in false positives that may result in surgeries or other actions with non-zero risk, and in a dozen other ways, some of which were alluded to by Dynomight, like the possibility of lawsuits.
My whole argument, which you said is “generally wrong”, is predicated on the fact that this information is not free. I don’t accept the notion that people are being irrational because they are making decisions based on the reality of the world where information is not free just because we can hypothesize about worlds where that information is free.
I agree with the other replies you’ve received about this.
You’re identifying other real costs of obtaining more information, but any information obtained, i.e. after paying whatever costs are required, is still positive.
You’re right that the expected value of obtaining some particular information could be negative, i.e. could be more ‘expensive’ than the value of the information.
But the information itself is always valuable. Practically, we might ‘round down’ its value to zero (0). But it’s not literally zero (0).
There’s still a value of the information itself – or at least it seems to me like there is, if only in principle – even after it’s been parsed/processed and is ready to ‘use’, e.g. for reasoning, updating belief networks, etc..
I gave the example of a Kalman filter in my other post. A Kalman filter is similar to recursive Bayesian estimation. It’s computationally intensive to run for an arbitrary number of values due to how it scales in complexity. If you have a faster algorithm for doing this, then you can revolutionize the field of autonomous systems + self-driving vehicles + robotics + etc.
The fact that “in principle” information provides value doesn’t matter, because the very example you gave of “updating belief networks” is exactly what a Kalman filter captures, and that’s what I’m saying is limiting how much information you can realistically handle. At some point I have to say, look, I can reasonably calculate a new world state based on 20 pieces of data. But I can’t do it if you ask me to look at 2000 pieces of data, at least not using the same optimal algorithm that I could run for 20 pieces of data. The time-complexity of the algorithm for updating my world state makes it prohibitively expensive to do that.
This really matters. If we pretend that agents can update their world state without incurring a cost of computation, and that it’s the same computational cost to update a world state based on 20 measurements as it would take for 2000 measurements, or if we pretend it’s only a linear cost and not something like N^2, then yes, you’re right, more information is always good.
But if there are computational costs, and they do not scale linearly (like a Kalman filter), then there can be negative value associated with trying to include low quality information in the update of your world state.
It is possible that the doctors are behaving irrationally, but I don’t think any of the arguments here prove it. Similar to what mu says on their post here.
You’re not wrong but you’re like deliberately missing the point!
You even admit the point:
The fact that “in principle” information provides value doesn’t matter
Yes, the point was just that ‘in principle’, any information provides value.
I think maybe what’s missing is that the ‘in principle point’ is deliberately, to make the point ‘sharper’, ignoring costs, which are, by the time you have used some information, also ‘sunk costs’.
The point is not that there are no costs or that the total value of benefits always exceeds the corresponding total anti-value of costs. The ‘info profit’ is not always positive!
The point is that the benefits are always (strictly) positive – in principle.
I disagree with this take and with the linked article. “The value of information is always positive” is brushing over the actual problem described by the healthcare studies: that taking a measurement is not guaranteed to accurately capture the world state, because a sensor can be faulty, and it is not always possible to distinguish a faulty sensor from a reliable sensor.
This reasoning is claimed to be incorrect in the linked article, and further clarified in the conclusion
Consider sensors in a control system. We only add additional sensors to the controller if we can guarantee some level of quality from the sensor. If we can’t guarantee the sensors are valid, then each additional sensor added to the system might not be adding information at all—“garbage in, garbage out”.
The article is making the claim that a rational agent has a reliable function
F(information) -> garbage?
but that’s ridiculous. “How do I tell my sensors are working correctly?” is one of the hardest problems in control theory. The solution used in system design is multiple, independent measurements that can all be assessed together.The claim in the article is that doing these tests is just that. Each test is an additional, independent measurement, that can be assessed against the base rate, and the other risk factors, or symptoms. That is technically true except it then glosses over the problems
Look at one of the quoted studies.
Bold is my own emphasis. Let’s flip a coin and look at another world state, a world state that did not occur for this patient, but has occurred for others.
In the real world, there is no reliable function
F(information) -> garbage?
. The idea that sometimes a test returns a false positive and it is “obvious” to the doctors that it is a false positive is incorrect. What do you want the doctors to do? Run the test, when it wasn’t likely that the patient has cancer (they have no other symptoms), see something that looks cancerous on the test (the false positive) and then do nothing? The conclusion here seems to be “obviously the doctor will realize it was a false positive and simply not operate”, which is ignoring the corpus of evidence in the linked studies showing that the doctors couldn’t distinguish between false positives![2]In other words, if you have a threshold for action that is “patients with cancer have these symptoms and also a mass on a CT scan”, but you have an arbitrary shortcut like “only do the biopsy only if the CT scan showed something serious enough to justify the risks” (quote from the article), then now you’ve tied your dangerous action (the surgery) to the thing that we know has a false positive rate—the test!
The solution offered in the article is “well don’t do the biopsy unless they also have the symptoms”. This is the “multiple, independent measurements that are assessed together” approach. Except that if they don’t have symptoms, and we’ve decided that symptoms are a prerequisite for the biopsy, then there’s no reason to do the CT scan, which is exactly what the doctors concluded in the studies that are being criticized here.
Actual percent not relevant, so long as the surgery is risky, e.g. above 0.5% mortality rate. I grabbed this 3% number from various articles like this.
It is weird to me that I need to say this, but when we discuss false positives on sensors, there’s for some reason an assumption that within the context of a system, we “know” that we measured a false positive. In general, the system is not aware of a false positive, that’s why false positives are a problem. The only way to “know” that a sensor returned a false positive is if you have other, independent measurements that you can use to do some type of out-of-family filtering.
No. Dynomight already addressed this: if you have an unreliable sensor (ie. any sensor that has ever existed in the real world), then that simply reduces how useful it is, because it changes your posterior less than a more reliable one would. The VoI remains positive; I refer you to Ramsey and Savage on this particular point of decision theory.
All of your additional comments are generally wrong, and reflect an extremely rigid absolutist approach to making decisions. We use unreliable correlated measures all the time, this is in fact ‘technically true’ and that is is the point, and yes, your entire example of doctors is simply due to irrationality and does not refute the decision theory point and it has nothing to do with ‘being obvious it is a false positive’ except in the trivial sense that for a poor measure the posterior of a true positive remains far smaller than it being a false positive and may not motivate a decision, shrinking the VoI towards zero, which will frequently be so small as to not justify the cost of testing (explicitly pointed out by Dynomight). It is definitely the case that many tests cost too much for too little information and should not be run because the VoI is often zero (for a rational decision maker) and the test is simply a loss as it will not change any decisions. Nevertheless, the value of free information is always greater than or equal to zero, and if free information makes you worse off, that implies somewhere there is an irrationality.
(The really relevant problem with this in the context of medicine is that decision theory is considering single agents in a stochastic environment, an idealized physician ordering tests to try to optimize patient health, because game theory hasn’t been invented yet; when you bring in multiple agents with different goals and mechanisms like lawsuits, then free information can be quite harmful, but this too is not lost on most people, including Dynomight at the end.)
NOTE: I wrote this as a separate reply because it’s addressing your points about decision theory directly, and is not about the specific scenario discussed with the medical system.
I think the crux here is that you seem to be saying the usefulness of reading a sensor’s value is in some interval
[0, 1]
, where 1 represents that the value provided by the sensor is perfectly trustworthy and 0 is that the value provided by the sensor is totally useless; i.e., it’s random noise. Under this belief, you’re saying that it is always rational to acquire as many sensors as possible, because there is no downside to acquiring useless sensors. When you run your filter over all of the sensors, anything that has a usefulness of 0 is going to get dropped from the final result. Likewise, low-but-non-zero usefulness sensors are weighted accordingly in the final result.In my work, this is called sensor fusion. So far, so good.
I can argue that acquiring each sensor has a cost associated with it, but it seems like the idea of “free information” is intended to deflect that argument. Let us assume that the sensors are provided for free, and it’s just a question of “given an arbitrary number of sensors, with different usefulness, how many do you want to fuse when trying to model the correct world state?”
I think what you’ve said above implies that a rational actor should always want more sensors.
More sensors leads to more sensor values (“information”), and the rational actor will simply use the usefulness of each sensor (which for the sake of argument we’ll assume that they know exactly) when weighting each sensor value.
In the real world, I still disagree with this claim. Computational complexity[1] exists. There is a cost to interpreting, and fusing, an arbitrary number of sensor values. Each additional sensor, even if it was provided by free, is going to incur an actual cost in computation before that value can be used to make a decision. A rational actor would not accept an arbitrary number of useless sensors if it is going to take non-zero computational cycles to disregard them.
When you include the cost of computation, now the value of those sensors is in some interval
[0 - c, 1 - c]
, wherec
is how much it costs in computational effort[2] to include the sensor in your filter. In this world, sensors can have less than zero usefulness, i.e. it is actively detrimental to include the sensor in your filter. Your filter functions worse with that sensor than it does without it.I believe the only way out of this is to ignore computational complexity and assume that
c = 0
, but we know that isn’t true. Consider the trivial thought experiment of me sitting here and providing you a series of useless facts about a fictional D&D campaign I’m running like, “A miraksaur is a type of dinosaur native to the planet Eurid.”, except the facts never stop. How rational would it be for you to keep trying to enter each additional value into your world state? They’re totally irrelevant, but if we ignore computational costs, there’s no downside to doing so. The reason why you should be wise to tune me out in that scenario is becausec
is definitely greater than 0.https://en.wikipedia.org/wiki/Computational_complexity
Note that
c
is only fixed per value in the case where the algorithm for fusing information has linear time complexityO(N)
. We often use something like an extended Kalman filter (EKF) for sensor fusion. In that scenario, each additional value incurs an increasingly higher cost of computational effort to include it, so sensors with low usefulness are especially penalized. If I recall correctly, it isO(N^2)
. It’ll get to a point where it doesn’t matter how useful a sensor is, it would be irrational to try and include it because it’ll be prohibitively expensive to run the full computation.If you’re worried about computational complexity, that’s OK. It’s not something that I mentioned because (surprisingly enough...) this isn’t something that any of the doctors discussed. If you like, let’s call that a “valid cost” just like the medical risks and financial/time costs of doing tests. The central issue is if it’s valid to worry about information causing harmful downstream medical decisions.
I’m sorry, but I just feel like we’ve moved the goal posts then.
I don’t see a lot of value in trying to disentangle the concept of information from 1.) costs to acquire that information, and 2.) costs to use that information, just to make some type of argument that a certain class of actor is behaving irrationally.
It starts to feel like “assume a spherical cow”, but we’re applying that simplification to the definition of what it means to be rational. First, it isn’t free to acquire information. But second, even if I assume for the sake of argument that the information is free, it still isn’t free to use it, because computation has costs.
if a theory of rational decision making doesn’t include that fact, it’ll come to conclusions that I think are absurd, like the idea that the most rational thing someone can do is acquire literally all available information before making any decision.
So first you say this.
But then you start to backtrack
And further admit
But then you try to defend the initial claim, that the doctors are being irrational
But we’ve already established that the tests are not free in the world we live in.
If you’re going to prove the doctors are being irrational in the world we live in, then you can’t change a core part of the problem statement. The tests do have costs—in time, in money, in available machines, in false positives that may result in surgeries or other actions with non-zero risk, and in a dozen other ways, some of which were alluded to by Dynomight, like the possibility of lawsuits.
My whole argument, which you said is “generally wrong”, is predicated on the fact that this information is not free. I don’t accept the notion that people are being irrational because they are making decisions based on the reality of the world where information is not free just because we can hypothesize about worlds where that information is free.
Do you still disagree?
I agree with the other replies you’ve received about this.
You’re identifying other real costs of obtaining more information, but any information obtained, i.e. after paying whatever costs are required, is still positive.
You’re right that the expected value of obtaining some particular information could be negative, i.e. could be more ‘expensive’ than the value of the information.
But the information itself is always valuable. Practically, we might ‘round down’ its value to zero (0). But it’s not literally zero (0).
Are you ignoring the cost of computation to use that information, as I explained here then?
Nope!
That’s a cost to use or process the information.
There’s still a value of the information itself – or at least it seems to me like there is, if only in principle – even after it’s been parsed/processed and is ready to ‘use’, e.g. for reasoning, updating belief networks, etc..
Then I’m not sure what our disagreement is.
I gave the example of a Kalman filter in my other post. A Kalman filter is similar to recursive Bayesian estimation. It’s computationally intensive to run for an arbitrary number of values due to how it scales in complexity. If you have a faster algorithm for doing this, then you can revolutionize the field of autonomous systems + self-driving vehicles + robotics + etc.
The fact that “in principle” information provides value doesn’t matter, because the very example you gave of “updating belief networks” is exactly what a Kalman filter captures, and that’s what I’m saying is limiting how much information you can realistically handle. At some point I have to say, look, I can reasonably calculate a new world state based on 20 pieces of data. But I can’t do it if you ask me to look at 2000 pieces of data, at least not using the same optimal algorithm that I could run for 20 pieces of data. The time-complexity of the algorithm for updating my world state makes it prohibitively expensive to do that.
This really matters. If we pretend that agents can update their world state without incurring a cost of computation, and that it’s the same computational cost to update a world state based on 20 measurements as it would take for 2000 measurements, or if we pretend it’s only a linear cost and not something like N^2, then yes, you’re right, more information is always good.
But if there are computational costs, and they do not scale linearly (like a Kalman filter), then there can be negative value associated with trying to include low quality information in the update of your world state.
It is possible that the doctors are behaving irrationally, but I don’t think any of the arguments here prove it. Similar to what mu says on their post here.
You’re not wrong but you’re like deliberately missing the point!
You even admit the point:
Yes, the point was just that ‘in principle’, any information provides value.
I think maybe what’s missing is that the ‘in principle point’ is deliberately, to make the point ‘sharper’, ignoring costs, which are, by the time you have used some information, also ‘sunk costs’.
The point is not that there are no costs or that the total value of benefits always exceeds the corresponding total anti-value of costs. The ‘info profit’ is not always positive!
The point is that the benefits are always (strictly) positive – in principle.