Our world in data offers a free download of their big Covid-19 dataset. It’s got data on lots of things including cases, deaths, and vaccines (full list of columns here), and all that by country and date—i.e., each row corresponds to one (country,date) pair with date date ranging from 2020-02-24 to 2021-08-20 for each country, stepsize one day.
Is there any not-ultra-complicated way to demonstrate vaccine effectiveness from this dataset? I.e., is there any way to measure the effect such that you would be confident predicting the direction ahead of time? (E.g., something like, for date Z, plot all countries by x=% vaccinated and y=#cases and measure the correlation, but you can make it reasonably more complicated than this by controlling for a hand full of variables or something.)
What do you mean by “demonstrate vaccine effectiveness”? My instinct is that it’s going to be ~impossible to prove a casual result in a principled way just from this data. (This is different from how hard it will be to extract Bayesian evidence from the data.)
For intuition, consider the hypothesis that countries can (at some point after February 2020) unlock Blue Science, which decreases cases and deaths by a lot. If the time to develop and deploy Blue Science is sufficiently correlated with the time to develop and deploy vaccines (and the common component can’t be measured well), it won’t be possible to distinguish casual effectiveness of vaccines from casual effectiveness of Blue Science.
(A Bayesian would draw some update even from an uncontrolled correlation, so if you want the Bayesian answer, the real question is “how much of an update so you want to demonstrate (and assuming what prior)”?
I mean something like, “a result that would constitute a sizeable Bayesian update to a perfectly rational but uninformed agent”. Think of someone who has never heard much about those vaccine thingies going from 50⁄50 to 75⁄25, that range.
Our world in data offers a free download of their big Covid-19 dataset. It’s got data on lots of things including cases, deaths, and vaccines (full list of columns here), and all that by country and date—i.e., each row corresponds to one (country,date) pair with date date ranging from 2020-02-24 to 2021-08-20 for each country, stepsize one day.
Is there any not-ultra-complicated way to demonstrate vaccine effectiveness from this dataset? I.e., is there any way to measure the effect such that you would be confident predicting the direction ahead of time? (E.g., something like, for date Z, plot all countries by x=% vaccinated and y=#cases and measure the correlation, but you can make it reasonably more complicated than this by controlling for a hand full of variables or something.)
What do you mean by “demonstrate vaccine effectiveness”? My instinct is that it’s going to be ~impossible to prove a casual result in a principled way just from this data. (This is different from how hard it will be to extract Bayesian evidence from the data.)
For intuition, consider the hypothesis that countries can (at some point after February 2020) unlock Blue Science, which decreases cases and deaths by a lot. If the time to develop and deploy Blue Science is sufficiently correlated with the time to develop and deploy vaccines (and the common component can’t be measured well), it won’t be possible to distinguish casual effectiveness of vaccines from casual effectiveness of Blue Science.
(A Bayesian would draw some update even from an uncontrolled correlation, so if you want the Bayesian answer, the real question is “how much of an update so you want to demonstrate (and assuming what prior)”?
I mean something like, “a result that would constitute a sizeable Bayesian update to a perfectly rational but uninformed agent”. Think of someone who has never heard much about those vaccine thingies going from 50⁄50 to 75⁄25, that range.