You are correct that someone who has one allergy may be more likely to have an other allergy, and that this violates the assumptions of our model. Our model relies on a strong independence assumption, there are many realistic cases where this independence assumption will not hold. I also agree that the video uses an example where the assumption may not hold. The video is oversimplified on purpose, in an attempt to get people interested enough to read the arXiv preprint.
If there is a small correlation between baseline risk and effect of treatment, this will have a negligible impact on the analysis. If there is a moderate correlation, you will probably be able to bound the true treatment effect using partial identification methods. If there is strong correlation, this may invalidate the analysis completely.
The point we are making is not that the model will always hold exactly. Any model is an approximation. Let’s suppose we have three choices:
Use a template for a causal model that “counts the living”, think about all the possible biological reasons that this model could go wrong, represent them in the model if possible, and account for them as best you can in the analysis
Use a template for a causal model that “counts the dead”, think about all the possible biological reasons that this model could go wrong, represent them in the model if possible, and account for them as best you can in the analysis
Use a model that is invariant to whether you count the living or the dead. This cannot be based on a multiplicative (relative risk) parameter.
The third approach will not be sensitive to the particular problems that I am discussing, but all the suggested methods of this type have their own problems. I have written this earlier, my view is that these problems are more troubling than the problems with the relative risk models.
What we are arguing in this preprint, is that if you decide to go with a relative risk model, you should choose between (1) and (2) based on the principles suggested by Sheps, and then reason about problems with this model and how it can be addressed in the analysis, based on the principles that you have correctly outlined in your comment.
I can assure you that if you decide to go with a multiplicative model but choose the wrong “base case”, then all of the problems you have discussed in your comments will be orders of magnitude more difficult to deal with in any meaningful way. In other words, it is only after you make the choice recommended by Sheps that it even becomes possibly the meaningfully analyze the reasons for deviation from effect homogeneity...
You are correct that someone who has one allergy may be more likely to have an other allergy, and that this violates the assumptions of our model. Our model relies on a strong independence assumption, there are many realistic cases where this independence assumption will not hold. I also agree that the video uses an example where the assumption may not hold. The video is oversimplified on purpose, in an attempt to get people interested enough to read the arXiv preprint.
If there is a small correlation between baseline risk and effect of treatment, this will have a negligible impact on the analysis. If there is a moderate correlation, you will probably be able to bound the true treatment effect using partial identification methods. If there is strong correlation, this may invalidate the analysis completely.
The point we are making is not that the model will always hold exactly. Any model is an approximation. Let’s suppose we have three choices:
Use a template for a causal model that “counts the living”, think about all the possible biological reasons that this model could go wrong, represent them in the model if possible, and account for them as best you can in the analysis
Use a template for a causal model that “counts the dead”, think about all the possible biological reasons that this model could go wrong, represent them in the model if possible, and account for them as best you can in the analysis
Use a model that is invariant to whether you count the living or the dead. This cannot be based on a multiplicative (relative risk) parameter.
The third approach will not be sensitive to the particular problems that I am discussing, but all the suggested methods of this type have their own problems. I have written this earlier, my view is that these problems are more troubling than the problems with the relative risk models.
What we are arguing in this preprint, is that if you decide to go with a relative risk model, you should choose between (1) and (2) based on the principles suggested by Sheps, and then reason about problems with this model and how it can be addressed in the analysis, based on the principles that you have correctly outlined in your comment.
I can assure you that if you decide to go with a multiplicative model but choose the wrong “base case”, then all of the problems you have discussed in your comments will be orders of magnitude more difficult to deal with in any meaningful way. In other words, it is only after you make the choice recommended by Sheps that it even becomes possibly the meaningfully analyze the reasons for deviation from effect homogeneity...