On coincidences and Bayesian reasoning, as applied to the origins of COVID-19

(Or: sometimes heuristics are no substitute for a deep dive into all of the available information).

This post is a response to Roko’s recent series of posts (Brute Force Manufactured Consensus is Hiding the Crime of the Century, The Math of Suspicious Coincidences, and A Back-Of-The-Envelope Calculation On How Unlikely The Circumstantial Evidence Around Covid-19 Is); however, I made a separate post for a few reasons.

  1. I think it’s in-depth enough to warrant its own post, rather than making comments

  2. It contains content that is not just a direct response to these posts

  3. It’s important, because those posts seem to have gotten a lot of attention and I think they’re very wrong.

Additional note: Much of this information is from the recent Rootclaim debate; if you’ve already seen that, you may be familiar with some of what I’m saying. If you haven’t, I strongly recommend it. Miller’s videos have fine-grained topic timestamps, so you can easily jump to sections that you think are most relevant.

The use of coincidences in Bayesian reasoning

A coincidence, in this context, is some occurrence that is not impossible or violates some hypothesis, but is a priori unlikely because it involves 2 otherwise unrelated things actually occurring together or with some relationship. For example, suppose I claimed to shuffle a deck of cards, but when you look at it, it is actually in some highly specific order; it could be 2 through Ace of spades, then clubs, hearts, and diamonds. The probability of this exact ordering, like any specific ordering, is 1/​52! from a truly random shuffle. Of course, by definition, every ordering is equally likely. However, there is a seeming order to this shuffle which should be rare among all orderings.

In order to formalize our intuition, we would probably rely on some measure of “randomness” or some notion related to entropy, and note that most orderings have a much higher value on this metric than ours. Of course, a few other orderings are similarly rare (e.g. permuting the order of suits, or maybe having all 2s, then all 3s, etc. each in suit order) but probably only a few dozen or a few hundred. So we say that “the probability of a coincidence like this one” is < 1000/​52!, which is still fantastically tiny, and thus we have strong evidence that the deck was not shuffled randomly. On the other hand, maybe I am an expert of sleight of hand and could easily sort the deck, say with probability 10%. Mathematically, we could say something like

And similarly for the alternative hypothesis, that I manipulated the shuffle.

On the other hand, we might have a much weaker coincidence. For example, we could see a 4 of the same value in a row somewhere in the deck, which has probability about 1425 (assuming https://​​www.reddit.com/​​r/​​AskStatistics/​​comments/​​m1q494/​​what_are_the_chances_of_finding_4_of_a_kind_in_a/​​ is correct). This is weird, but if you shuffled decks of cards on a regular basis, you would find such an occurrence fairly often. If you saw such a pattern on a single draw, you might be suspicious that the dealer were a trickster, but not enough to overcome strong evidence that the deck is indeed random (or even moderate evidence, depending on your prior).

However, if we want to know the probability of some coincidence in general, that’s more difficult, since we haven’t defined what “some coincidence” is. For example, we could list all easily-describable patterns that we might find, and say that any pattern with a probability of at most 1100 from a given shuffle is a strange coincidence. So if we shuffle the deck and find such a coincidence, what’s the Bayes Factor in favor of stacked deck? 100x, right? Or if we get 2 such coincidences, it’s 10,000x, right?

No! The probability of finding any such coincidence is higher than 1100, possibly much higher. Figuring out exactly what the true probability is may be difficult. For one, the events may not be independent or exclusive. In particular, we can’t just multiply probabilities if we have more than 1 coincidence: For example, suppose we have a lot of hearts and spades in the top half of the deck; can we then also note that we have a lot of clubs and diamonds in the bottom half, and say that we have 2 suspicious coincidences, and square P(many hearts and spades in the top half) to get the probability of this outcome? Of course not. And of course, P(any of N outcomes X_1...X_N)) is generally larger than P(X_i) for any given i.

The standard way to really be confident in our result is to consider all of the outcomes that would have surprised us a similar or greater amount, and ask about the probability of any such outcome, like we did above. And we also have to be careful about what we consider a “coincidence” (or multiple coincidences) since A) sometimes things are more likely than they seem; B) sometimes multiple coincidences are actually one coincidence; and C) the more things you look at, the more likely at least one of them is to be “unlikely” (essentially, avoid p-hacking). In practice of course this is quite difficult, especially in cases where there was very little pre-registration, but it should at least be considered.

None of this is special

There’s not anything about a “coincidence” that impacts how we analyze this situation. We just care about how unlikely some outcome is under each hypothesis. It doesn’t really matter if our justification for the mismatch between theory and evidence is coincidence, measurement error (which is just another form of bad luck), malicious manipulation of data, p-hacking/​cherry picking, or something else. It only really matters what P(evidence|hypothesis) is for each hypothesis under consideration. Our usual tools of analyzing data will mostly apply here as they do in any other case.

The coincidence of Covid-19 starting in Wuhan

Roko writes:

How many times do you have to rerun history for a naturally occurring virus to randomly appear outside the lab that’s studying it at the exact time they are studying it? I think it’s at least 1000:1 against.

As stated literally, this question is rather difficult to answer for several reasons, including the variables not being independent, pandemics being rare, the question being ambiguously worded. It would not be surprising if a lab studied pathogens that are likely to be found nearby. It also would not be surprising if they were to study those viruses over an extended period of time.

The follow-up post looks at essentially the same variables:

  1. Coincidence of Location: Wuhan is a particularly special place in China for studying covid-19; the WIV group was both the most important, most highly-cited group before 2020, and the only group that was doing GoF on bat sarbecoronaviruses as far as I know. Wuhan is about 0.5% of China’s population. It’s a suspicious coincidence that a viral pandemic would occur in the same city as the most prominent group that studies it.

  2. Coincidence of timing: several things happened that presaged the emergence of covid-19. In December 2017, the US government lifted a ban on risky pathogen research, and in mid-2018 the Ecohealth group started planning how to make covid in the DEFUSE proposal. A natural spillover event could have happened at any time over either the last, say, 40 years or (probably) the next 40 years, though likely not much before that due to changing patterns of movement (I need help on exactly how wide this time interval is).

  3. Warnings turning out to be accurate: Warnings were given in Nature specifically mentioning the WIV/​Zhengli Shi group and no other group involved with coronaviruses, and only a few other groups involved with any viruses at all (in other articles). There were hundreds of groups that could have been warned about I think, but this article gives 59 as the number of BSL-4 labs around the world. This is a subtler point than those above because getting a warning is extra evidence for the lab leak hypothesis even conditional on the timing and location coincidence. Warnings were also given about WIV itself independent of the connection to coronaviruses too.

(I’m leaving out point 4 for now since Roko outsources the Bayes Factors; we’ll get back to that).

The first issue here is really just one of facts. Not everywhere in China is equally likely to be the source of a natural pandemic. Simulations indicate that pandemics like Covid are much more likely to start in urban areas (or to be specific, they are likely to go extinct if they start in rural areas), and historically this has been the case as well (e.g. Sars 1 started in a city). In addition, South and Central China are closer to wildlife like bats, civets, and racoon dogs, and there is a thriving wildlife trade in many of these cities, while the same is not true in Northern China. We should also include other parts of Southeast Asia; Roko estimates 700 million people living within the distance of Wuhan from a plausible origin location. That seems reasonable to me; China is around 60% urban, so maybe somewhere around 400 million people are plausible candidates for patient 0. 11400 = 2.75%.

Wuhan is a particularly big city, it’s a transportation hub, and a lot of wildlife passes through it. The actual likelihood of a bat coronavirus starting in Wuhan may actually be even higher than this, say 5%. On the other hand, maybe there are reasons why Wuhan is less likely, but these have to be demonstrated in order to claim we have a highly suspicious coincidence. A claim of a strong Bayes Factor requires a very careful argument (see Confidence levels inside and outside an argument). Do you think that the arguments in Roko’s posts about location are less than 1200 to be wrong?

Timing

The discussion of timing returns to the nominal topic of this post, coincidences. First, the “coincidence” of specific, relevant work being done at WIV just before the pandemic started is based on a rejected grant proposal from 2018. What is the probability of a lab having at least one proposal like this in the previous few years? From the information in any of these posts, it’s impossible to tell. Maybe every big virology lab is regularly submitting grant proposals for similar research. And a rejected grant proposal doesn’t mean that specific work has actually been done, so you should not only consider grant proposals, but published research, unpublished or in-progress research, and other weak evidence like conference talks, interviews, and maybe even social media posts or emails. No indication that any such search has been conducted appears in any analysis of the lab leak hypothesis of which I am aware, and so it is impossible to assign a meaningful value to P(bat coronavirus pandemic starts in city with a lab doing research on bat coronaviruses), since for all we know every city in China meets this criterion.

Also, remember what we said about counting any of several different things as a coincidence? Let’s take a look at why the timing is supposed to be so suspicious:

But gain of function is a new invention—it only really started in 2011 and funding was banned in 2014, then the moratorium was lifted in 2017. The 2011-2014 period had little or no coronavirus gain of function work as far as I am aware. So coronavirus gain of function from a lab could only have occurred after say 2010 and was most likely after 2017 when it had the combination of technology and funding

If we had a similar pandemic in 2012, right when this sort of research became possible, would that also have been a suspicious coincidence? Unclear, but there’s certainly some possibility. In fact, is the 2014 moratorium even relevant? Alina Chan says no, in which case 2019 isn’t anything special, and you have something like 8 years since the relevant research apparently became possible, not 2. Maybe she’s wrong—but you should discount the 280 number but however likely you think she is to be correct.

Actually, we have to go further. The proposal emphasized work being at UNC, which has a lot more experience manipulating viruses. So we have to consider a very wide range of types of work that would be considered “suspicious.” I’m just going to link Miller’s slides from day 1 (go to slide 41) which mentions “suspicious-sounding” research in many cities in China, links to EcoHealth alliance, even adding an FCS to coronaviruses at other labs, all in the past few years. Once you start assuming that work was done without good evidence, it’s easy to assert that just about any lab could be involved. With these standards, the answer to the question “How many times do you have to rerun history for a naturally occurring virus to randomly appear outside the lab that’s studying it at the exact time they are studying it?” looks like it would actually be close to 1; it’s certainly much more likely than just “how likely is it that Covid started in Wuhan specifically?” In any event, Roko gives 2 out of 80 (even though the yearly rate of natural virus pandemics out of South and Central China seems to be more like 3%);[1]again, does it seem like the chance that this argument is wrong is really much less than 2.5%?

Independence

You might note that I’m talking about location again, rather than timing. Partially this is because Roko included the grant proposal under the timing section, but also because I don’t think these factors are independent, and so you can’t just multiply numbers together. The Defuse grant proposal impacts not just the timing you should consider, but also the location.

Viral features

The last major piece of evidence that Roko cites is based on molecular and genetic features. We can apply similar tests as above.

First, how do we know what counts as suspicious? Nothing was pre-registered, so we really don’t have a good sense for how many “suspicious coincidences” would be in a random virus. Or, really, how many there would be in a virus that caused a major human pandemic, because any such virus must be rare on some measure—most viruses don’t cause human pandemics. Do you think that a motivated reasoner could find some suspicious patterns in a given virus if they really wanted to? Roko takes his source’s 130 million and rounds down to 1500, but do you really think that this exercise would result in a positive finding only a fraction of a percent of the time?

Second, do we even think that any of these claims hold up? The source seems to mostly focus on food and drug related topics, not virology, so let’s take a careful look at their justification.

SARS-CoV-2 has a furin cleavage site positioned in the spike protein at the S1/​S2 junction. The furin cleavage site supercharged the virus into the worst pandemic pathogen in a century. Virologists have yet to identify one in any other related coronavirus.

Roko cites a tweet in one of his posts to the same effect, saying that none of the 800 other known sarbecoviruses have an FCS, so “p-value < 0.002.” But again, Covid isn’t a random virus. The whole reason the FCS is deemed to be relevant is because it impacts how it affects humans—if only 1800 viruses has caused a pandemic, we would definitely expect it to have some features that are rare among those viruses. Otherwise they would have already caused other pandemics!

Second, FCS are fairly common through other coronaviruses. They’re also spread throughout the tree, intermingled with non-FCS viruses, suggesting they evolved multiple times. I don’t know how to assign a probability here, because there’s a fair amount of arbitrariness in what counts as a “separate virus” or the “same family” and because evolution is a complicated dynamical system. But nothing remotely like 1 out of several hundred is justified without further argument.

SARS-CoV-2 emerged highly infectious without evolving much in humans

Their main citation here seems to be… the Daily Mail, quoting a “Trump aide.” Their other source is one of their own articles, but the evidence quoted is far too vague to be putting big numbers on.

The genome of SARS-CoV-2 falls within the range of a 25 percent genetic difference from SARS.

I can’t tell what this means, but Roko seems to ignore it.

(Edit: I think this is saying that WIV was looking at viruses within 25% genetic distance of Sars-1, and so would have been a valid candidate for study.)

Their last claim is based on this preprint, which I don’t have the ability to analyze, though even RSUTK’s summary admits other experts aren’t convinced. Roko gives it 1/​1000 before rounding the combined effect down to 1500; again, I don’t think an unpublished preprint like this justifies anything like that level of confidence.

Intermezzo: Bias in sources

In a recent post (Most experts believe COVID-19 was probably not a lab leak) many commenters, including Roko, expressed skepticism at taking the results of a survey of experts at face value. This is fair (apparently a substantial fraction of respondents had claimed to be familiar with a fake study, for example); however, scrutiny should be applied to all sources. Who are the 3 authors of the preprint linked above, and why should we trust them? What about USRTK? Is Richard H. Ebright less biased than whoever was surveyed?

If we’re going to be skeptical of potentially biased people as a source of information, then we have to apply that same standard to any person, even if we agree with them. It feels weird saying this in 2024 on a rationalist forum, since this seems like one of the most basic principles of rationality, period. But apparently we need a reminder.

These sorts of arguments are easy to make

Without careful consideration, it’s easy to come up with arguments that imply very strong Bayes factors. For example:

  1. What is the probability, with a lab leak, of all of the known early cases being located at or clustered around a market on the other side of town? A market that keeps and sells wild animals, which is the kind of place where SARS 1 started? And in fact, is the exact place that at least one virologist (Edward Holmes) identified as the likely start of a viral pandemic years ago!

  2. In fact, there appear to have been 2 separate spillover events. No early cases cluster around any other location, such as the WIV, so this already suspicious event essentially happened twice!

How many different public, well-trafficked indoor places are there in Wuhan, a city of 11 million people? Has to be at least 1,000? So that’s 1/​1000 against, maybe 2/​1000 if you account for there being a few other small wet markets. But both early crossover events resulted in only having cases at the same market (i.e. we didn’t have 1 cluster at each of 2 different markets), so maybe it really is 1/​1000, and then square it for 11 million. And it could easily be a lot less than that if there are more than 1,000 possible spreading locations. Or if we note that 4 of the first 5 known cases worked at the market, rather than just visiting there, and use workers at market/​Wuhan population = about 110,000.

I doubt anyone takes these numbers seriously

There are plenty of obvious objections to the argument above (just as there are objections to the pro-lab-leak arguments). The biggest one is the fact that many cases were missed, especially early. This isn’t a guess; given Covid’s hospitalization rate, and the fact that no one was looking for it in mid-December 2019, it’s pretty much a guarantee that many people were infected at the time but did not know it. However, the mere fact that some cases are unknown doesn’t mean the clustering around the market is wrong. Cases spread outwards from the market over time, with no other clear center point. There are no other initial clusters of cases (most notably, nothing near the lab). And there is a limit to the number of “missing cases” there can be, because we know how fast it spreads (about 2 doublings a week with no mitigation). If there are actually 50 cases severe enough to be hospitalized on December 10th, instead of the handful we know about, then on on January 23rd, when Wuhan was locked down, there would be about 50*2^(6*2) = 200,000 hospitalizations and, given its death rate, around 40,000 deaths. There’s some error in these numbers but the actual number of deaths was only nowhere near that,even a month after that. You can miss cases, but the death count is not going to be 10x what you think it is.

(Also, although cases could be missing, there’s no reason to expect that hospitalizations would be biased toward the market if cases are not).

Now, this argument does clearly leave the possibility that someone from the WIV was patient 0 and brought Covid to the market. But it’s still a very unlikely coincidence, probably much stronger than the coincidence of the pandemic starting in Wuhan to begin with.

...actually, if you have such strong skepticism of the case data, why do you believe the pandemic began in Wuhan at all? We know that covid can spread quickly from city to city, and there are likely to be a lot of missing cases. It could have spilled over in the countryside (Hubei has civet farms, for example) or in another city, and patient 0 could have hopped on a train to Wuhan.

A second set of objections usually revolves around China manipulating the data in some capacity. Again, this is possible. But also, again, you have to ask why you believe anything at all? If China could fake all of the early case data, why couldn’t they make it seem like it started somewhere else? Couldn’t the early cases seeming to be in Wuhan itself be an attempt to draw attention away from somewhere even less likely to be the origin of a natural pandemic? If they want to make it seem like the market is the source, why not create a fake lab test of an animal? The term “conspiracy theory” is overused, but once you start speculating that some malevolent entity has both the motive and the capability to do whatever is convenient for your theory, it quickly becomes unfalsifiable.

In addition, China doesn’t seem to have had any particular motive to frame the HSM instead of the WIV. Back in December 2019 and January 2020, they seemed mostly interested in covering up the existence of a pandemic at all. But this failed miserably, with Chinese doctors reporting on this new serious pandemic even as the government arrested several of them (food for thought if you think China could hide all evidence of a lab leak). Their current position is that the virus is of American origin. Why would they try to make the market seem like the origin? If they could impart arbitrary bias onto the data, why not have it cluster around a hotel, train station, or airport, which would be more consistent with the idea that it came from somewhere else?

Conclusion

It is always possible to come up with some explanation whereby the conspiracy just so happens to behave in exactly the way that prevents you from firmly disproving the theory. This pattern reminds me of the warning in Contaminated by Optimism:

It is a fact of life that we hold ideas we would like to believe, to a lower standard of proof than ideas we would like to disbelieve. In the former case we ask “Am I allowed to believe it?” and in the latter case ask “Am I forced to believe it?”

Is the evidence for a zoonotic origin sufficient to force you to believe it? No; that would probably have required an open and thorough investigation into the market back in December 2019.

Is the evidence for a lab leak strong enough to allow you to believe it? Sure, if you believe that enough relevant evidence would be hidden, and that the pandemic starting in Wuhan is a really strong coincidence.

Is it actually the case that lab leak is the most likely explanation? Not by far, in my opinion.

Addendum: Debate Results

Since I started writing this post, the results of the Rootclaim debate have been announced. Both judges agreed with Miller’s evaluation, that the zoonotic origin is substantially more likely. Judge Will’s decision is at https://​​www.youtube.com/​​watch?v=YlxTztAkdGQ&ab_channel=PeterMiller and Eric’s decision is at https://​​www.youtube.com/​​watch?v=OKwunTJ1b40&ab_channel=PeterMiller

Both videos contain links to the judges’ decision making processes in the descriptions. I highly recommend looking at them (I’m still making my way through) as well as Rootclaim’s response linked above. Especially if you didn’t want to watch the original videos (and Rootclaim is changing to primarily written for the future!)

Appendix: Genetic features

Other than the pandemic starting location, the main lines of evidence cited (by either side) concern genetic features of the virus. As far as I can tell:

  1. Some aspects of the viral genome look somewhat weird, but are difficult to clearly identify as being strong evidence of either lab leak or zoonosis. For example, the CGG codon is rare in human coronaviruses, but it’s not that rare and the reason for this (C and G provoking an immune response) may just not apply to this particular virus for unknown reasons.

  2. We just don’t know enough about viral evolution. For example, as I discussed above and in one of my comments, furin cleavage sites do not appear in sarbecoviruses (the group of viruses that includes SARS-Cov-2) but does appear frequently in other slightly less-related groups of coronaviruses. I don’t really know how to turn these facts into a probability, as evolution is a complex, dynamic process, and the FCS is related to its infectiousness in humans (and thus to its ability to create a pandemic). If I wanted to ignore this complexity, it looks at a glance like something like 12 of all betacoronaviruses have a FCS, so I could say the Bayes factor is actually only 1:2 in favor of lab leak, even before accounting for the fact that the FCS makes the virus more likely to infect humans, but that would be just as wrong, just with bias in the other direction.

  3. There’s nothing that conclusively identifies the virus as being engineered, such as a feature that appears no where else in nature but is common in engineering.

  4. While some more detailed analysis, future data, etc. could possibly shed more light on this question, it is certainly not possible to take 1 or 2 easily-summarized soundbites and perhaps a paragraph or 2 of a analysis and come to the conclusion that the viral genome shows clear signs of being engineered.

In conclusion, it’s difficult to assign a strong Bayes factor in either direction here, and going into all of the details would be a bit much. I recommend the Rootclaim debate, both in video form and the judges’ conclusions, to provide more specifics on both this topic and the epidemiological evidence.

  1. ^

    The time from the SARS-1 pandemic to Covid was 15-17 years, depending on how you count. A common technique here is to assume you are in the “middle” of the time period between events, so you double the gap to get 1 natural pandemic every 30-34 years. Going back further, there were 2 major flu pandemics starting in Hong Kong and Guizhou in the 50s and 60s. That’s 3 pandemics in the 70 years prior to Covid. We could go back further and make the rate lower, but even rounding up to 90 years puts us at exactly 1 natural viral pandemic in this region per 30 years. The extent to which the flu pandemics should weigh on this question is an exercise for the reader, but even just looking at SARs-1 I find 180 per year to be too low.