Stephen Bennett comments on How much should you update on a COVID test result?

Stephen Bennett 18 Oct 2021 4:03 UTC
6 points
Re: “ If the sensitivity is actually 100%, then we get a Bayes factor of 0, which is weird and unhelpful — your odds of having COVID shouldn’t go to literally 0. I would interpret this as extremely strong evidence that you don’t have COVID, though. I’d love to hear from people with a stronger statistics background than me if there’s a better way to interpret this.”

The test doesn’t actually have 100% sensitivity. That’s an estimate based on some study they ran that had some number of true positives out of some number of tests on true cases. Apparently it got all of those right, and from that they simply took the point estimate to equal the sample rate.

The Bayesian solution to this is to assume a prior distribution (probably a Beta(1,1)), which will update in accordance to incoming evidence from the outcomes of tests. If the study had 30 tests (I haven’t read it since I’m on mobile, so feel free to replace that number with whatever the actual data are), that’d correspond to a posterior of a Beta(31,1) (note that in general Betas update by adding successes to the first parameter and failures to the second parameter, so the prior of 1 becomes a posterior of 31 after 30 successes). Taking a point estimate based on the mean of this posterior would give you a (n+1)/(n+2) percent sensitivity. In my toy example, that’d be 31/32% or ~97%. Again, replace n with the sample size of the actual experiment.

Some notes:

A real Bayes factor would be slightly more complicated to compute since the point estimate given the posterior involves some loss of information, but would give very similar values in practice because a Beta is a pretty nice function

The Beta(1,1) is probably better known as the Uniform distribution. It’s not the only prior you can use, but it’ll probably be from the beta family for this problem.

As a test with a true 100% sensitivity accumulates more data, the point estimate of its sensitivity given this method will approach 100% (since (n+1)/(n+2) approaches 1 as n approaches infinity), which is a nice sanity check.

When the test fails to detect covid, it will increment the second number in the beta distribution. For an intuition of what this distribution looks like for various values, this website is pretty good: https://keisan.casio.com/exec/system/1180573226
- mayleaf 19 Oct 2021 3:10 UTC
  8 points
  Parent
  I haven’t had time to read up about Beta distributions and play with the tool you linked, but I just wanted to say that I really appreciate the thorough explanation! I’m really happy that posting about statistics on LessWrong has the predictable consequence of learning more statistics from the commenters :)
- gwillen 18 Oct 2021 16:22 UTC
  3 points
  Parent
  Obviously this correction is relatively most important when the point estimate of the sensitivity/specificity is 100%, making the corresponding Bayes factor meaningless. Do you have a sense of how important the correction is for smaller values / how small the value can be before it’s fine to just ignore the correction? I assume everything is pulled away from extreme values slightly, but in general not by enough to matter.
  - Stephen Bennett 19 Oct 2021 1:15 UTC
    3 points
    Parent
    Simple answer first: If the sensitivity and specificity are estimated with data from studies with large (>1000) sample sizes it mostly won’t matter.
    
    Various details:
    
    Avoiding point estimates altogether will get you broader estimates of the information content of the tests, regardless of whether you arrive at those point estimates from Bayesian or frequentist methods.
    
    Comparing the two methods, the Bayesian one will pull very slightly towards 50% relative to simply taking the sample rate as the true rate. Indeed, it’s equivalent to adding a single success and failure to the sample and just computing the rate of correct identification in the sample.
    
    The parameters of a Beta distribution can be interpreted as the total number of successes and failures, combining the prior and observed data to get you the posterior.
    - mayleaf 19 Oct 2021 3:09 UTC
      4 points
      Parent
      Thanks, I was wondering if the answer would be something like this (basically that I should be using a distribution rather than a point estimate, something that @gwillen also mentioned when he reviewed the draft version of this point).
      If the sensitivity and specificity are estimated with data from studies with large (>1000) sample sizes it mostly won’t matter.
      That’s the case for the antigen test data; the sample sizes are >1000 for each subgroup analyzed (asymptomatic, symptoms developed <1 week ago, symptoms developed >1 week ago).
      The sample size for all NAATs was 4351, but the sample size for the subgroups of Abbot ID Now and Cepheid Xpert Xpress were only 812 and 100 respectively. Maybe that’s a small enough sample size that I should be suspicious of the subgroup analyses? (@JBlack mentioned this concern below and pointed out that for the Cepheid test, there were only 29 positive cases total).