This is a great question. However, I think we need to know more about the properties of the test before the problem is fully specified: What are the false positive and false negative rates, and how do these numbers change in the pool-testing case?
This FAQ on the CDC website links to this pdf on the FDA’s website. I don’t have the background to understand most of this, but I pasted some snippets that seem interesting below. Note that it appears this info only applies to a particular CDC-endorsed test being used in the US—I don’t know what’s in widespread use internationally.
Negative results do not preclude 2019-nCoV infection and should not be used as the sole basis for treatment or other patient management decisions. Negative results must be combined with clinical observations, patient history, and epidemiological information.
False negative rate could be high.
2019-nCoV Positive Control (nCoVPC)
For use as a positive control with the CDC 2019- nCoV Real-Time RT-PCR Diagnostic Panel procedure.
It sounds as though the standard test kit includes 4 test tubes of noninfectious material that should produce a positive test result if things are working properly. This could give a general sense of what dilution levels the test continues to be accurate at.
Page 32 of the PDF has a section labelled “2019-nCoV Markers (N1 and N2)” which keeps referencing “threshold lines”. This makes me think that although the test is advertised as providing a discrete positive/negative outcome, it really gives a continuous output along with guidelines for discretization (but in theory this continuous output could be translated into a probabilistic assessment instead, if we had the necessary data and some theoretical understanding of what’s going on).
Additionally, it might be possible to make a vague guess about the likely number of infected individuals in a pool as a function of these continuous parameters, esp. if combined with info re: where each is in their disease progression (see time series note below).
There’s a “2019-nCoV rRT-PCR Diagnostic Panel Results Interpretation Guide” section which discusses the possibility of inconclusive results. However, I think Table 9 on p. 41 of the PDF weakly suggests that they are rare.
From the “Limitations” section of the PDF:
Optimum specimen types and timing for peak viral levels during infections caused by 2019-nCoV have not been determined. Collection of multiple specimens (types and time points) from the same patient may be necessary to detect the virus.
So now the problem has a time series aspect. Things keep getting more and more interesting ;)
A false negative result may occur if a specimen is improperly collected, transported or handled. False negative results may also occur if amplification inhibitors are present in the specimen or if inadequate numbers of organisms are present in the specimen.
So now we have to worry about whether each individual testee might have “amplification inhibitors” which screw up the entire batch? Hey CDC, are you sure there’s no way to do some kind of control amplification, to check if amplification is working as intended regardless of the specimen’s 2019- nCoV status?
Positive and negative predictive values are highly dependent on prevalence. False negative test results are more likely when prevalence of disease is high. False positive test results are more likely when prevalence is moderate to low.
“Priors are a thing.” Seems obvious, but if someone is going to create a web app that makes it easy to conduct pool tests, it should probably ask the user what the prevalence of 2019- nCoV is in the local population. You don’t want a user to blindly trust a prediction made based on an inaccurate base rate.
If the virus mutates in the rRT-PCR target region, 2019-nCoV may not be detected or may be detected less predictably. Inhibitors or other types of interference may produce a false negative result. An interference study evaluating the effect of common cold medications was not performed.
Good to know.
Performance Characteristics
This section looks pretty relevant.
LoD studies determine the lowest detectable concentration of 2019-nCoV at which approximately 95% of all (true positive) replicates test positive. The LoD was determined by limiting dilution studies using characterized samples.
...
Tables 4 & 5 on page 37 of the PDF looks very interesting, displaying test sensitivity as a function of RNA copies. I’m not sure how to interpret the “Mean Ct” row, or what it means for a dilution to be “≥ 95% positive”. I’m also not sure why there are subtables for 2019-nCoV_N1 and 2019-nCoV_N2 (it looks like they represent two different test markers?)
I also don’t know what a typical number of RNA copies would be in an infectious specimen.
Tables 4 & 5 make me a bit pessimistic about this entire project, because they imply that a factor of 10 dilution (from roughly 3 RNA copies per μL to roughly 1 RNA copy per 3 per μL) reduces test sensitivity from ~100% to ~50%. Then again, it’s possible that these numbers are chosen because the CDC wanted to figure out the precise range that the test broke down in, and in a real-world scenario, an infectious specimen is likely to have way more than 3 RNA copies per μL (stands to reason doesn’t it?)
Anyway, the best approach may be an algorithm which takes conditional probabilities (and maybe also prior information about local prevalance) as inputs, so the algorithm can be run with minimal tweaks as more data is gathered regarding the real false positive/false negative rates + the algorithm can be used with any test whose performance characteristics are known.
An alternate frame is instead of thinking of each testee as being either positive or negative for the virus, think of each testee as having some number of 2019-nCoV RNA copies per μL of specimen (0 if they’re negative for 2019-nCoV, presumably!)
Through this lens, you could see the problem as an active learning problem where the goal is to learn a regression coefficient for the 2019-nCoV RNA concentration for each testee. If you’re using a clever algorithm powered by a good noise model, you will probably sometimes find yourself testing specimens which aren’t an even mix of different testee specimens (example: we think Patient A might have a very small number of RNA copies per μL, so we test a specimen which is 50% from Patient A but 10% from Patients B/C/D/E/F because that gives us more information).
Side note: In the course of this research, I happened to notice this note on the CDC website:
I believe that I have found a treatment or vaccine for COVID-19. Is CDC the best place to submit my idea?
BARDA is providing a portal to support U.S. government medical countermeasure research and development. Interested stakeholders can learn more here.
That could be a good link to use once we think we have something which is good enough to share with the US government.
EDIT: If anyone can put me in contact with someone who’s actually performed these tests, and ideally also has some knowledge of how they work under the hood, I would love that. Active learning is a topic I’m very interested in. I can program the app if you can answer all of my questions about the testing process. Send me a message via my user page with your contact info.
This is a great question. However, I think we need to know more about the properties of the test before the problem is fully specified: What are the false positive and false negative rates, and how do these numbers change in the pool-testing case?
This FAQ on the CDC website links to this pdf on the FDA’s website. I don’t have the background to understand most of this, but I pasted some snippets that seem interesting below. Note that it appears this info only applies to a particular CDC-endorsed test being used in the US—I don’t know what’s in widespread use internationally.
False negative rate could be high.
It sounds as though the standard test kit includes 4 test tubes of noninfectious material that should produce a positive test result if things are working properly. This could give a general sense of what dilution levels the test continues to be accurate at.
Page 32 of the PDF has a section labelled “2019-nCoV Markers (N1 and N2)” which keeps referencing “threshold lines”. This makes me think that although the test is advertised as providing a discrete positive/negative outcome, it really gives a continuous output along with guidelines for discretization (but in theory this continuous output could be translated into a probabilistic assessment instead, if we had the necessary data and some theoretical understanding of what’s going on).
Additionally, it might be possible to make a vague guess about the likely number of infected individuals in a pool as a function of these continuous parameters, esp. if combined with info re: where each is in their disease progression (see time series note below).
There’s a “2019-nCoV rRT-PCR Diagnostic Panel Results Interpretation Guide” section which discusses the possibility of inconclusive results. However, I think Table 9 on p. 41 of the PDF weakly suggests that they are rare.
From the “Limitations” section of the PDF:
So now the problem has a time series aspect. Things keep getting more and more interesting ;)
So now we have to worry about whether each individual testee might have “amplification inhibitors” which screw up the entire batch? Hey CDC, are you sure there’s no way to do some kind of control amplification, to check if amplification is working as intended regardless of the specimen’s 2019- nCoV status?
“Priors are a thing.” Seems obvious, but if someone is going to create a web app that makes it easy to conduct pool tests, it should probably ask the user what the prevalence of 2019- nCoV is in the local population. You don’t want a user to blindly trust a prediction made based on an inaccurate base rate.
Good to know.
This section looks pretty relevant.
Tables 4 & 5 on page 37 of the PDF looks very interesting, displaying test sensitivity as a function of RNA copies. I’m not sure how to interpret the “Mean Ct” row, or what it means for a dilution to be “≥ 95% positive”. I’m also not sure why there are subtables for 2019-nCoV_N1 and 2019-nCoV_N2 (it looks like they represent two different test markers?)
I also don’t know what a typical number of RNA copies would be in an infectious specimen.
Tables 4 & 5 make me a bit pessimistic about this entire project, because they imply that a factor of 10 dilution (from roughly 3 RNA copies per μL to roughly 1 RNA copy per 3 per μL) reduces test sensitivity from ~100% to ~50%. Then again, it’s possible that these numbers are chosen because the CDC wanted to figure out the precise range that the test broke down in, and in a real-world scenario, an infectious specimen is likely to have way more than 3 RNA copies per μL (stands to reason doesn’t it?)
Anyway, the best approach may be an algorithm which takes conditional probabilities (and maybe also prior information about local prevalance) as inputs, so the algorithm can be run with minimal tweaks as more data is gathered regarding the real false positive/false negative rates + the algorithm can be used with any test whose performance characteristics are known.
An alternate frame is instead of thinking of each testee as being either positive or negative for the virus, think of each testee as having some number of 2019-nCoV RNA copies per μL of specimen (0 if they’re negative for 2019-nCoV, presumably!)
Through this lens, you could see the problem as an active learning problem where the goal is to learn a regression coefficient for the 2019-nCoV RNA concentration for each testee. If you’re using a clever algorithm powered by a good noise model, you will probably sometimes find yourself testing specimens which aren’t an even mix of different testee specimens (example: we think Patient A might have a very small number of RNA copies per μL, so we test a specimen which is 50% from Patient A but 10% from Patients B/C/D/E/F because that gives us more information).
Side note: In the course of this research, I happened to notice this note on the CDC website:
That could be a good link to use once we think we have something which is good enough to share with the US government.
EDIT: If anyone can put me in contact with someone who’s actually performed these tests, and ideally also has some knowledge of how they work under the hood, I would love that. Active learning is a topic I’m very interested in. I can program the app if you can answer all of my questions about the testing process. Send me a message via my user page with your contact info.
I set up a calendar for medical professionals to book me here:
https://calendly.com/book-john/discuss-covid-19-testing