Thanks to JustisMills and Ruby for reviewing this post. Any errors are my own.
TLDR: The French long COVID study which suggests that beliefin having hadCOVID is correlated with long COVID symptoms but that actually having had COVID is not correlated with long COVID symptoms used the wrong statistical tool to obtain this result.
The authors suggest that nearly all long COVID symptoms might not be caused by SARS-CoV-2 (except for those associated with anosmia). I believe that this is true in some cases but not remotely close to the extent suggested by the paper.
Study Design
Roughly speaking the experimental setup was:
Send out a whole load of serology tests to ~36,000 people in May-Nov 2020 (Serology tests are antibody tests which are intended to show if you have ever had COVID).
Perform the tests (~27,000 received back), then give participants their results.
Send out a questionnaire about whether participants think they’ve had COVID and what persistent symptoms they’ve had (Dec 2020-Jan 2021).
Exclude some people for reasons.
e.g. Participants who thought that they had had COVID after they did the serology test were excluded.
Run some logistic regressions on different symptoms vs belief in having had COVID and/or serology results.
You may have spotted the first problem. We’re trying to test whether people’s belief in whether they’ve had COVID or their actually having had COVID is a better predictor of long COVID symptoms but we’ve given participants their serology results before we ask them if they think they’ve had COVID.
You’d think that this would ruin the results – belief in having had COVID should be extremely well correlated with having a positive serology result.
Fortunately (?!) this doesn’t seem to be the case. Of everyone who had a positive serology results, only 41.5% replied that they thought they’d had COVID. Of everyone who thought they’d had COVID, 50.4% had had a negative serology result.
I’m super confused by this but I’ll take this at face value for the moment and move on to the analysis.
Combined effects logistic regression with correlated predictors
The main reported result comes from model 3 of the study’s analysis. This is the combined effects logistic regression model which uses 2 predictors:
Belief in having had COVID.
Serology results.
To predict:
Presence of persistent symptoms (18 different symptoms).
The result of this model was that a lot of symptoms (16/18) were predicted well by belief in having had COVID but that only anosmia was predicted by serology results.
This seems pretty damning of long COVID symptoms being caused by SARS-CoV-2, at least until we consider the correlation between the 2 predictive properties.
Consider the following example with 100 participants:
89 are negative for belief and serology.
None have symptom A.
10 are positive for belief and serology.
9 have symptom A.
1 does not have symptom A.
1 is positive for belief and but negative for serology.
They have symptom A.
Running the equivalent of model 3 from the study on these data will show that belief in having had COVID is a positive predictor of symptom A but that a positive serology result is a negative predictor of symptom A.
At the same time, 90% of people who had COVID have symptom A compared to 1.1% of people who didn’t have COVID!
This is kinda tricky to explain but bear with me.
Taking each predictor separately, belief is a stronger predictor of symptom A than serology.
This is due to the last participant mentioned (the only participant whose belief and serology don’t match). For them, positive belief predicts having symptom A but negative serology predicts symptom A.
The model notices this difference in predictive power and makes belief a strong positive predictor.
It then looks at any variation which isn’t explained by belief but that can be explained by serology.
Consider the last 11 people on the list (all the positive for belief participants).
100% of people who were negative for serology had symptom A.
90% of people who were positive for serology had symptom A.
So, given that someone is positive for belief, being positive for serology actually decreases the probability of having symptom A.
In reality the 2 predictors are optimised concurrently using gradient descent but the result is the same.
Probably people who are familiar with statistics are cringing slightly at that explanation but I hope it gives an intuitive idea of what is happening. Essentially:
All of the examples where positive serology makes you more likely to have symptom A are better explained (according to the model) by being positive for belief.
After adjusting for belief, being serology positive makes you slightly less likely to experience symptom A.
Of course this example is me just making up numbers to show how counter-intuitive results can be from this kind of model.
However, hopefully it illustrates the problems you can have when running a combined effects logistic regression with correlated predictors. This might not be a problem (or even be a feature) in some cases but when one of your predictors (having COVID) often causes the other (believing that you had COVID) then you have to think more carefully about your model.
Serology
Is there a simple way to assess whether COVID causes the symptoms in the study? Yes, just run the logistic regression with serology results as the only predictor. Fortunately for us the study includes this model – model 2.
Model 2 results show that the likelihoods of experiencing the following persistent symptoms are increased by having had COVID (odds ratio / percentage point increase vs serology negative):
Fatigue (2.59 / 5.0%)
Anosmia (15.69 / 4.3%)
Poor attention/concentration (2.10 / 2.8%)
Breathing difficulties (3.60 / 2.3%)
Chest pain (3.70 / 1.4%)
Palpitations (2.61 / 1.2%)
Headache (1.69 / 0.9%)
Dizziness (2.37 / 0.6%)
Cough (2.22 / 0.6%)
Other symptoms (1.91 / 1.3%)
If we add all the percentage point increases (i.e. how many more percentage points serology positive participants experienced persistent symptoms vs serology negative participants—data from table 2) then we get 20.3%. So having COVID on average gives you ~0.2 persistent symptoms vs not having COVID, with presumably some people having more than one symptom.
This is roughly in line with Scott’s conclusions in Long COVID: Much more than you wanted to know. The specific symptoms experienced are also in line with that post, so if that post reflects your current understanding of long COVID then I wouldn’t update much based on this study except to add some more confidence to a couple of the points Scott makes:
2. The prevalence of Long COVID after a mild non-hospital-level case is probably somewhere around 20%, but some of this is pretty mild.
3. The most common symptoms are breathing problems, issues with taste/smell, and fatigue + other cognitive problems.
Serology vs Belief
Can we say anything about how much effect belief in having had COVID has on Long COVID compared to actually having had COVID?
I think it’s difficult based on this study, because participants knew their serology results before stating their belief and I really have no idea how this affected the results. I’ll keep pretending that this isn’t an issue for the moment.
We can compare model 2 (serology) results to model 1 (belief in having had COVID) along with values from table 2. The percentage points increases from belief are on average 2.17x (range 1.55-2.92) higher than the equivalents for serology (for the symptoms which are significant for serology). So if the belief value represents the full population who report symptoms then actually having had COVID accounts for 46% of those. If we include the other symptoms which aren’t significant for serology then this number will get lower.
At face value this suggests that just over half of the people with long COVID symptoms who think that they had COVID are wrong. This is important but not the same as “A serology test result positive for SARS-COV-2 was positively associated only with persistent anosmia” as is reported in the study.
If we factor in the obvious problems with the experimental setup, then it’s hard to know how much credence to give the study’s data on this topic.
French long COVID study: Belief vs Infection
Thanks to JustisMills and Ruby for reviewing this post. Any errors are my own.
TLDR: The French long COVID study which suggests that belief in having had COVID is correlated with long COVID symptoms but that actually having had COVID is not correlated with long COVID symptoms used the wrong statistical tool to obtain this result.
In reality, the study data show that long COVID symptoms are correlated with having had COVID and agree with Scott’s conclusions in Long COVID: Much more than you wanted to know.
Study
The authors suggest that nearly all long COVID symptoms might not be caused by SARS-CoV-2 (except for those associated with anosmia). I believe that this is true in some cases but not remotely close to the extent suggested by the paper.
Study Design
Roughly speaking the experimental setup was:
Send out a whole load of serology tests to ~36,000 people in May-Nov 2020 (Serology tests are antibody tests which are intended to show if you have ever had COVID).
Perform the tests (~27,000 received back), then give participants their results.
Send out a questionnaire about whether participants think they’ve had COVID and what persistent symptoms they’ve had (Dec 2020-Jan 2021).
Exclude some people for reasons.
e.g. Participants who thought that they had had COVID after they did the serology test were excluded.
Run some logistic regressions on different symptoms vs belief in having had COVID and/or serology results.
You may have spotted the first problem. We’re trying to test whether people’s belief in whether they’ve had COVID or their actually having had COVID is a better predictor of long COVID symptoms but we’ve given participants their serology results before we ask them if they think they’ve had COVID.
You’d think that this would ruin the results – belief in having had COVID should be extremely well correlated with having a positive serology result.
Fortunately (?!) this doesn’t seem to be the case. Of everyone who had a positive serology results, only 41.5% replied that they thought they’d had COVID. Of everyone who thought they’d had COVID, 50.4% had had a negative serology result.
I’m super confused by this but I’ll take this at face value for the moment and move on to the analysis.
Combined effects logistic regression with correlated predictors
The main reported result comes from model 3 of the study’s analysis. This is the combined effects logistic regression model which uses 2 predictors:
Belief in having had COVID.
Serology results.
To predict:
Presence of persistent symptoms (18 different symptoms).
The result of this model was that a lot of symptoms (16/18) were predicted well by belief in having had COVID but that only anosmia was predicted by serology results.
This seems pretty damning of long COVID symptoms being caused by SARS-CoV-2, at least until we consider the correlation between the 2 predictive properties.
Consider the following example with 100 participants:
89 are negative for belief and serology.
None have symptom A.
10 are positive for belief and serology.
9 have symptom A.
1 does not have symptom A.
1 is positive for belief and but negative for serology.
They have symptom A.
Running the equivalent of model 3 from the study on these data will show that belief in having had COVID is a positive predictor of symptom A but that a positive serology result is a negative predictor of symptom A.
At the same time, 90% of people who had COVID have symptom A compared to 1.1% of people who didn’t have COVID!
This is kinda tricky to explain but bear with me.
Taking each predictor separately, belief is a stronger predictor of symptom A than serology.
This is due to the last participant mentioned (the only participant whose belief and serology don’t match). For them, positive belief predicts having symptom A but negative serology predicts symptom A.
The model notices this difference in predictive power and makes belief a strong positive predictor.
It then looks at any variation which isn’t explained by belief but that can be explained by serology.
Consider the last 11 people on the list (all the positive for belief participants).
100% of people who were negative for serology had symptom A.
90% of people who were positive for serology had symptom A.
So, given that someone is positive for belief, being positive for serology actually decreases the probability of having symptom A.
In reality the 2 predictors are optimised concurrently using gradient descent but the result is the same.
Probably people who are familiar with statistics are cringing slightly at that explanation but I hope it gives an intuitive idea of what is happening. Essentially:
All of the examples where positive serology makes you more likely to have symptom A are better explained (according to the model) by being positive for belief.
After adjusting for belief, being serology positive makes you slightly less likely to experience symptom A.
Of course this example is me just making up numbers to show how counter-intuitive results can be from this kind of model.
However, hopefully it illustrates the problems you can have when running a combined effects logistic regression with correlated predictors. This might not be a problem (or even be a feature) in some cases but when one of your predictors (having COVID) often causes the other (believing that you had COVID) then you have to think more carefully about your model.
Serology
Is there a simple way to assess whether COVID causes the symptoms in the study? Yes, just run the logistic regression with serology results as the only predictor. Fortunately for us the study includes this model – model 2.
Model 2 results show that the likelihoods of experiencing the following persistent symptoms are increased by having had COVID (odds ratio / percentage point increase vs serology negative):
Fatigue (2.59 / 5.0%)
Anosmia (15.69 / 4.3%)
Poor attention/concentration (2.10 / 2.8%)
Breathing difficulties (3.60 / 2.3%)
Chest pain (3.70 / 1.4%)
Palpitations (2.61 / 1.2%)
Headache (1.69 / 0.9%)
Dizziness (2.37 / 0.6%)
Cough (2.22 / 0.6%)
Other symptoms (1.91 / 1.3%)
If we add all the percentage point increases (i.e. how many more percentage points serology positive participants experienced persistent symptoms vs serology negative participants—data from table 2) then we get 20.3%. So having COVID on average gives you ~0.2 persistent symptoms vs not having COVID, with presumably some people having more than one symptom.
This is roughly in line with Scott’s conclusions in Long COVID: Much more than you wanted to know. The specific symptoms experienced are also in line with that post, so if that post reflects your current understanding of long COVID then I wouldn’t update much based on this study except to add some more confidence to a couple of the points Scott makes:
Serology vs Belief
Can we say anything about how much effect belief in having had COVID has on Long COVID compared to actually having had COVID?
I think it’s difficult based on this study, because participants knew their serology results before stating their belief and I really have no idea how this affected the results. I’ll keep pretending that this isn’t an issue for the moment.
We can compare model 2 (serology) results to model 1 (belief in having had COVID) along with values from table 2. The percentage points increases from belief are on average 2.17x (range 1.55-2.92) higher than the equivalents for serology (for the symptoms which are significant for serology). So if the belief value represents the full population who report symptoms then actually having had COVID accounts for 46% of those. If we include the other symptoms which aren’t significant for serology then this number will get lower.
At face value this suggests that just over half of the people with long COVID symptoms who think that they had COVID are wrong. This is important but not the same as “A serology test result positive for SARS-COV-2 was positively associated only with persistent anosmia” as is reported in the study.
If we factor in the obvious problems with the experimental setup, then it’s hard to know how much credence to give the study’s data on this topic.