Genetic Sequencing of Wastewater: Prevalence to Relative Abundance

Back in September I wrote:

In thinking about how you might identify future pandemics by sequencing wastewater, you might have a goal of raising an alert before some fraction of people were currently infected. What you’re actually able to observe, however, are sequencing reads, several steps removed from infection rates. Can we use covid data to estimate how the fraction of people currently infected with some pathogen might translate into the fraction of wastewater sequencing reads that match the pathogen?

In that post I looked at a single pathogen (SARS-CoV-2) in a single metagenomic sequencing dataset (Rothman et al 2021) and got a very rough point estimate (2.3e-8 relative abundance at 0.1% prevalence). What fraction of sequencing reads might come from a novel pathogen at some level of prevalence continues to be a key question, however, and this quarter I’m working with several other people at the NAO in trying to get a better understanding here.

Specifically, we’d like to understand how relative abundance (fraction of sequencing reads matching an organism) varies with of prevalence (what fraction of people are currently infected) and organism (ex: since we’re sampling wastewater you’d expect disproportionately more gastrointestinal than blood pathogens).

Here’s the current plan:

Gather wastewater metagenomic sequencing data, mostly by looking at papers that published it in the Sequencing Read Archive. I’d love it if we could also include our own data here, but we aren’t far enough along to have much yet.
Process the sequencing data (code) to clean it (remove adapters, trim low-quality bases, collapse paired-end reads) and identify the reads (assign them to taxonomic nodes).
Gather corresponding estimates for the prevalence of various human viruses in the populations contributing to the metagenomic data. (code)
Build and fit a model for relative abundance as a function of prevalence, sequencing method, and the type of organism.

Overall, this would be a big step forwards towards estimating the feasibility of this kind of detection: cost should be inversely proportional to relative abundance.

We’re reasonably far along on (1) and (2), and if you’re curious you can poke around. That shows the counts for human-infecting viruses across samples. It’s rough (ex: we’re not doing any correction for PCR duplication yet) so don’t take it too seriously and let us know if you see something suspicious. On (3) and (4) things are much earlier: we currently have prevalence estimates for five viruses and I’d like to get at least ten times this many.

(If you’re curious why I haven’t been talking more about writing a book since my post a month ago, this is a lot of it. Right around when I posted that I moved from mostly doing individual work to leading this project, and the opportunity cost of taking time away became much higher. I do still want to write something summarizing the advice I got from people around making a book, though, and it’s possible I’ll come back to the book project.)

This post describes in-progress work at the NAO and covers work from a team including Simon Grimm and Asher Parker-Sartori estimating prevalences, Dan Rice modeling, Will Bradshaw evaluating sequencing methods, and Mike McLaren identifying relevant papers and providing general technical guidance.

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer