The confidence interval in the Cepheid analysis does not inspire confidence.
Usually when a test claims “100% sensitivity”, it’s based on all members of some sample with the disease testing positive. The lower end of the 95% interval is a lower bound on the true sensitivity such that there would still be at least 5% chance of getting no false negatives.
That’s where it starts to look dodgy: Normally it would be 2.5% to cover upper and lower tails of the distribution, but there is no tail below zero false negatives. It looks like they used 2.5% anyway, incorrectly, so it’s really a 97.5% confidence interval. The other problem is that the positive sample size must have been only 29 people. That’s disturbingly small for a test that may be applied a billion times, and seriously makes me question their validation study that reported it.
There are a number of assumptions you can use to turn this into an effective false negative rate for Bayesian update purposes. You may have priors on the distribution of true sensitivities, priors on study validity, and so on. They don’t matter very much, since they mostly yield a distribution with an odds ratio geometric mean around the 15-40 range anyway. If I had to pick a single number based only on seeing their end result, I’d go with 96% sensitivity under their study conditions, whatever those were.
I’d lower my estimate for real life tests, since real life testing isn’t usually nearly as carefully controlled as a validation study, but I don’t know how much to lower it.
The other problem is that the positive sample size must have been only 29 people. That’s disturbingly small for a test that may be applied a billion times, and seriously makes me question their validation study that reported it.
Thanks for flagging this. The review’s results table (“Summary of findings 1”) says “100 samples” and “29 SARS-COV-2 cases”; am I correctly interpreting that as 100 patients, of which 29 were found to have COVID? (I think this is what you’re saying too, just want to make sure I’m clear on it)
If I had to pick a single number based only on seeing their end result, I’d go with 96% sensitivity under their study conditions, whatever those were.
I hadn’t actually read the review, but yes I meant that the sample must have had 29 people who were known (through other means) to be positive for SARS-cov-2, and all tested positive.
Can you say more about how you got 96%?
Educated guessing, really. I did a few simple models with a spreadsheet for various prior probabilities including some that were at each end of being (subjectively, to me) reasonable. Only the prior for “this study was fabricated from start to finish but got through peer review anyway” made very much difference in the final outcome. (If you have 10% or more weight on that, or various other “their data can’t be trusted” priors then you likely want to adjust the figure downward)
So with a rough guess at a prior distribution, I can look at the outcomes from the point of view of “what single value has the same end effect on evidence weight as this distribution”. I make it sound fancy, but it’s really just “if there was a 30th really positive test subject in these dozen or so possible worlds that I’m treating as roughly equally likely, and I only include possible worlds where the validation detected all of the first 29 cases, how often does that 30th test come up positive?” That come out at close to 96%.
I’m having trouble discerning this from your description and I’m curious—is this approach closely related to the approach GWS describes above, involving the beta distribution, which basically seems to amount to adding one “phantom success” and one “phantom failure” to the total tally?
It is related in the sense that if your prior for sensitivity is uniform, then the posterior is that beta distribution.
In my case I did not have a uniform prior on sensitivity, and did have a rough prior distribution over a few other factors I thought relevant, because reality is messy. Certainly don’t take it as “this is the correct value”, and the approach I took almost certainly has some major holes in it even given the weasel-words I used.
The confidence interval in the Cepheid analysis does not inspire confidence.
Usually when a test claims “100% sensitivity”, it’s based on all members of some sample with the disease testing positive. The lower end of the 95% interval is a lower bound on the true sensitivity such that there would still be at least 5% chance of getting no false negatives.
That’s where it starts to look dodgy: Normally it would be 2.5% to cover upper and lower tails of the distribution, but there is no tail below zero false negatives. It looks like they used 2.5% anyway, incorrectly, so it’s really a 97.5% confidence interval. The other problem is that the positive sample size must have been only 29 people. That’s disturbingly small for a test that may be applied a billion times, and seriously makes me question their validation study that reported it.
There are a number of assumptions you can use to turn this into an effective false negative rate for Bayesian update purposes. You may have priors on the distribution of true sensitivities, priors on study validity, and so on. They don’t matter very much, since they mostly yield a distribution with an odds ratio geometric mean around the 15-40 range anyway. If I had to pick a single number based only on seeing their end result, I’d go with 96% sensitivity under their study conditions, whatever those were.
I’d lower my estimate for real life tests, since real life testing isn’t usually nearly as carefully controlled as a validation study, but I don’t know how much to lower it.
Thanks, I appreciate this explanation!
Thanks for flagging this. The review’s results table (“Summary of findings 1”) says “100 samples” and “29 SARS-COV-2 cases”; am I correctly interpreting that as 100 patients, of which 29 were found to have COVID? (I think this is what you’re saying too, just want to make sure I’m clear on it)
Can you say more about how you got 96%?
I hadn’t actually read the review, but yes I meant that the sample must have had 29 people who were known (through other means) to be positive for SARS-cov-2, and all tested positive.
Educated guessing, really. I did a few simple models with a spreadsheet for various prior probabilities including some that were at each end of being (subjectively, to me) reasonable. Only the prior for “this study was fabricated from start to finish but got through peer review anyway” made very much difference in the final outcome. (If you have 10% or more weight on that, or various other “their data can’t be trusted” priors then you likely want to adjust the figure downward)
So with a rough guess at a prior distribution, I can look at the outcomes from the point of view of “what single value has the same end effect on evidence weight as this distribution”. I make it sound fancy, but it’s really just “if there was a 30th really positive test subject in these dozen or so possible worlds that I’m treating as roughly equally likely, and I only include possible worlds where the validation detected all of the first 29 cases, how often does that 30th test come up positive?” That come out at close to 96%.
I’m having trouble discerning this from your description and I’m curious—is this approach closely related to the approach GWS describes above, involving the beta distribution, which basically seems to amount to adding one “phantom success” and one “phantom failure” to the total tally?
It is related in the sense that if your prior for sensitivity is uniform, then the posterior is that beta distribution.
In my case I did not have a uniform prior on sensitivity, and did have a rough prior distribution over a few other factors I thought relevant, because reality is messy. Certainly don’t take it as “this is the correct value”, and the approach I took almost certainly has some major holes in it even given the weasel-words I used.
Thanks for the info!