A side question, prompted by an amusing factoid in the Hernan paper: ”...we restricted the population to women who had reported plausible energy intakes (2510 –14,640 kJ/d)”.
In the statistical analysis in this paper, and also as a general practice in medical publications based on questionnaire data, are there adjustments for uncertainty in the questionnaire responses?
When you have a data point that says, for example, that person #12345 reports her caloric intake as 4,000 calories/day, do you take it as a hard precise number, or do you take it as an imprecise estimate with its own error which propagates into the model uncertainty, etc.?
Keyword is “measurement error.” People think hard about this. Anders_H knows this paper in a lot more detail than I do, but I expect these particular authors to be careful.
This issue is also related to “missing data.” What you see might be different from the underlying truth in systematic ways, e.g. you get systematic bias in your data, and you need to deal with that. This is also related to that causal inference stuff I keep going on about.
Keyword is “measurement error.” People think hard about this.
People like engineers and physicists think a lot about this. I am not sure that medical researchers think a lot about this. The usual (easy) way is to throw out unreasonable-looking responses during the data cleaning and then take what remains as rock-solid. Accepting that your independent variables are uncertain leads to a lot of inconvenient problems (starting with the OLS regression not being a theoretically-correct form any more).
What you see might be different from the underlying truth in systematic ways, e.g. you get systematic bias in your data, and you need to deal with that.
Yes, that’s another can of worms. In some areas (e.g. self-reported food intake) the problem is so blatant and overwhelming that you have to deal with it, but if it looks minor not many people want to bother.
A side question, prompted by an amusing factoid in the Hernan paper: ”...we restricted the population to women who had reported plausible energy intakes (2510 –14,640 kJ/d)”.
In the statistical analysis in this paper, and also as a general practice in medical publications based on questionnaire data, are there adjustments for uncertainty in the questionnaire responses?
When you have a data point that says, for example, that person #12345 reports her caloric intake as 4,000 calories/day, do you take it as a hard precise number, or do you take it as an imprecise estimate with its own error which propagates into the model uncertainty, etc.?
Keyword is “measurement error.” People think hard about this. Anders_H knows this paper in a lot more detail than I do, but I expect these particular authors to be careful.
This issue is also related to “missing data.” What you see might be different from the underlying truth in systematic ways, e.g. you get systematic bias in your data, and you need to deal with that. This is also related to that causal inference stuff I keep going on about.
People like engineers and physicists think a lot about this. I am not sure that medical researchers think a lot about this. The usual (easy) way is to throw out unreasonable-looking responses during the data cleaning and then take what remains as rock-solid. Accepting that your independent variables are uncertain leads to a lot of inconvenient problems (starting with the OLS regression not being a theoretically-correct form any more).
Yes, that’s another can of worms. In some areas (e.g. self-reported food intake) the problem is so blatant and overwhelming that you have to deal with it, but if it looks minor not many people want to bother.
Clinicians do not, “methodology people” (who often partner up with “domain experts”) to do data analysis, absolutely do.