According to this article, it’s possible to test a pool of up to (at least) 64 people for SARS-CoV-2 at once, and the test will say whether at least one person in the pool is positive. So, here’s the puzzle:
We have a large number N of people that need to be tested. Each has SARS-CoV-2 with probability P. (The cases of interest are P<5%, maybe even P<1%, see comment.) We can test any subset of the N (or maybe up to a maximum pool size S), with the test returning positive if at least person in the subset is positive. (Bonus: allow that the tests are imperfect, with a per-test false positive rate FPR and false negative rate FNR.)
Your mission is to develop a testing protocol, i.e. an algorithm for what tests to run in what order. The simplest example (proposed in that article linked above) would be: Test non-overlapping groups, and if the test is positive, test each individual within the group. But is there a better way?
Score your algorithm’s success on some combination of: (1) minimal total expected number of tests, (2) trying not to ask people for too many samples (lower priority on this one I think[1]), (3) false positive and false negative rates for individuals, (4) minimal serial time (i.e., minimal number of steps where we need results from step N to figure out what tests to run in step N+1).
(I’m not sure how to combine these criteria into a single score that reflects practical goals; if anyone knows, please comment. Or tell me if I’m leaving something out.)
Bonus: What if we have prior information, in the form of a N×N correlation matrix? (This could reflect the fact that for cohabitants / colleagues / neighbors, it’s likelier that both or neither have it.)
- ↩︎
The swabbing process is mildly unpleasant, so other things equal, better to not require lots of swabs. I don’t know whether or not it’s possible to split the sample one swab and put it into multiple test-pools.
This is a great question. However, I think we need to know more about the properties of the test before the problem is fully specified: What are the false positive and false negative rates, and how do these numbers change in the pool-testing case?
This FAQ on the CDC website links to this pdf on the FDA’s website. I don’t have the background to understand most of this, but I pasted some snippets that seem interesting below. Note that it appears this info only applies to a particular CDC-endorsed test being used in the US—I don’t know what’s in widespread use internationally.
False negative rate could be high.
It sounds as though the standard test kit includes 4 test tubes of noninfectious material that should produce a positive test result if things are working properly. This could give a general sense of what dilution levels the test continues to be accurate at.
Page 32 of the PDF has a section labelled “2019-nCoV Markers (N1 and N2)” which keeps referencing “threshold lines”. This makes me think that although the test is advertised as providing a discrete positive/negative outcome, it really gives a continuous output along with guidelines for discretization (but in theory this continuous output could be translated into a probabilistic assessment instead, if we had the necessary data and some theoretical understanding of what’s going on).
Additionally, it might be possible to make a vague guess about the likely number of infected individuals in a pool as a function of these continuous parameters, esp. if combined with info re: where each is in their disease progression (see time series note below).
There’s a “2019-nCoV rRT-PCR Diagnostic Panel Results Interpretation Guide” section which discusses the possibility of inconclusive results. However, I think Table 9 on p. 41 of the PDF weakly suggests that they are rare.
From the “Limitations” section of the PDF:
So now the problem has a time series aspect. Things keep getting more and more interesting ;)
So now we have to worry about whether each individual testee might have “amplification inhibitors” which screw up the entire batch? Hey CDC, are you sure there’s no way to do some kind of control amplification, to check if amplification is working as intended regardless of the specimen’s 2019- nCoV status?
“Priors are a thing.” Seems obvious, but if someone is going to create a web app that makes it easy to conduct pool tests, it should probably ask the user what the prevalence of 2019- nCoV is in the local population. You don’t want a user to blindly trust a prediction made based on an inaccurate base rate.
Good to know.
This section looks pretty relevant.
Tables 4 & 5 on page 37 of the PDF looks very interesting, displaying test sensitivity as a function of RNA copies. I’m not sure how to interpret the “Mean Ct” row, or what it means for a dilution to be “≥ 95% positive”. I’m also not sure why there are subtables for 2019-nCoV_N1 and 2019-nCoV_N2 (it looks like they represent two different test markers?)
I also don’t know what a typical number of RNA copies would be in an infectious specimen.
Tables 4 & 5 make me a bit pessimistic about this entire project, because they imply that a factor of 10 dilution (from roughly 3 RNA copies per μL to roughly 1 RNA copy per 3 per μL) reduces test sensitivity from ~100% to ~50%. Then again, it’s possible that these numbers are chosen because the CDC wanted to figure out the precise range that the test broke down in, and in a real-world scenario, an infectious specimen is likely to have way more than 3 RNA copies per μL (stands to reason doesn’t it?)
Anyway, the best approach may be an algorithm which takes conditional probabilities (and maybe also prior information about local prevalance) as inputs, so the algorithm can be run with minimal tweaks as more data is gathered regarding the real false positive/false negative rates + the algorithm can be used with any test whose performance characteristics are known.
An alternate frame is instead of thinking of each testee as being either positive or negative for the virus, think of each testee as having some number of 2019-nCoV RNA copies per μL of specimen (0 if they’re negative for 2019-nCoV, presumably!)
Through this lens, you could see the problem as an active learning problem where the goal is to learn a regression coefficient for the 2019-nCoV RNA concentration for each testee. If you’re using a clever algorithm powered by a good noise model, you will probably sometimes find yourself testing specimens which aren’t an even mix of different testee specimens (example: we think Patient A might have a very small number of RNA copies per μL, so we test a specimen which is 50% from Patient A but 10% from Patients B/C/D/E/F because that gives us more information).
Side note: In the course of this research, I happened to notice this note on the CDC website:
That could be a good link to use once we think we have something which is good enough to share with the US government.
EDIT: If anyone can put me in contact with someone who’s actually performed these tests, and ideally also has some knowledge of how they work under the hood, I would love that. Active learning is a topic I’m very interested in. I can program the app if you can answer all of my questions about the testing process. Send me a message via my user page with your contact info.
I set up a calendar for medical professionals to book me here:
https://calendly.com/book-john/discuss-covid-19-testing
This article provides a helpful look at optimal pooling strategies: https://arxiv.org/pdf/1007.4903.pdf
“We show that if the prevalence of positive samples is greater than 30% it is never worth pooling. From 30% down to 1% pools of size 4 are close to optimal. Below 1% substantial gains can be made by pooling, especially if the samples are pooled twice. However, with large pools the sensitivity of the test will fall correspondingly and this must be taken into consideration. We derive simple expressions for the optimal pool size and for the corresponding proportion of samples tested.”
I think it may be possible to do even better than the algorithm in this paper if your 2nd round of pooling is allowed to cut across multiple pools from your first round.
This reminds me of that cute information-theory puzzle about finding the ball with the different weight! I’m pretty dumb and bad at math, but I think the way this works is that since each test is a yes-or-no question, we can reduce our uncertainty by at most one bit with each test, and, as in Twenty Questions, we want to choose the question such that we get that bit.
A start on the simplest variant of the problem, where we’re assuming the tests are perfect and just trying to minimize the number of tests: the probability of at least one person in the pool having the ’roni in pool size S is going to be 1−(1−P)S, the complement of the probability of them all not having it. We want to choose S such that this quantity is close to 0.5, so S=log0.5log1−P. (For example, if P = 0.05, then S ≈ 13.51.)
My next thought is that when we do get a positive test with this group size, we should keep halving that group to find out which person is positive—but that would only work with certainty if there were exactly one positive (if the “hit” is in one half of the group, you know it’s not in the other), which isn’t necessarily the case (we could have gotten “lucky” and got more than one hit in our group that was chosen to have a fifty-fifty shot of having at least one) …
Partial credit???
[EDIT:
I see the monograph in gwern’s post below makes the observation that group testing is a Boolean version of compressed sensing and references Gilbert et al. 2008 which I’m just reading—found a freely available version on Google Scholar. And more importantly, the monograph contains algorithms with similar bounds to CS.
READ GWERN’S POST BELOW FIRST
]
Some ideas...
This is can be seen as a nonlinear version of a compressed sensing problem (see here for a good “25 minute” intro to compressed sensing, or the Wikipedia article).
The nonlinearity comes from the fact that the pooled test seems to behave like an approximate OR-gate: if any sample is positive the pooled sample will be positive, otherwise the pooled sample will test negative. I’m not sure how to deal with that.
The vector to be “reconstructed” is
(We can replace “has” with “is deemed by this test to have”, obviously there will be an error rate)
Let our M×N “sampling matrix” be
Then our sample results are
where ⋁ denotes the sample-taking operation, I’m using the OR symbol here assuming that’s approximately how it works.
If instead the sample response is linear and we rescale so that the sample responses count the number of positives in each sample we have
The compressed sensing idea is, if you choose the rows (sampling patterns) Ai∗ to be random and incoherent (meaning, very roughly, not too similar), you only need roughly M≈DlogND samples to reconstruct xj with “high” probability, where D is the number of nonzeros in xj . (All this hand-waving is tightened up in the literature). In other words, we only need to do roughly −NPlogP pooled samples. The method is to solve:
There are known algorithms for doing this (see the 25-minute intro), e.g. any linear programming (LP) algorithm. I’m thinking if there is a way to encode the OR-gate behaviour as LP (strictly speaking, mixed-integer programming, MIP) constraints. Surely the results are different in terms of required M and computational efficiency...but worth trying I think.
Some thought still required on how the test false-positive, false-negative, true-positive, true-negative rates propagate through the model.
Well-known researchers in this area are Terence Tao, Emmanuel Candès, and David Donoho … there are undoubtedly many others but (clearly!) I don’t have extensive knowledge of the literature.
The connection with gwern’s post below is that another way of looking at compressed sensing is “sub-Nyquist/Shannon sampling for sparse signals”—the information theory connection again. [EDIT: and the algorithms there seem to achieve similar performance bounds]
If the maximum subset size S<N then I think you can just do batch processing of batches of size S?
Test random overlapping groups, then logically deduce who isn’t infected and who how probably is. Tune group size and test count using simulations on generated data. I intuit that serial tests gain little unless P is << 1⁄64. In that case, test non-overlapping groups, then run the non-serial protocol on everyone who was in a yes-group—within those, we can update P to >= 1⁄64.