I see what you’re saying, but I was thinking of a case where there is zero probability of having overlap among all features. While that technically restores the property that you can multiply the dataset by arbitrarily large numbers, if feels a little like “cheating” and I agree with your larger point.
I guess Simpson’s paradox does always have a right answer in “stratify along all features”, it’s just that the amount of data you need increases exponentially in the number of relevant features. So I think that in the real world you can multiply the amount of data by a very, very large number and it won’t solve the problem, even though in a large enough number will.
In the real world it’s often also sort of an open question if the number of “features” is finite or not.
I see what you’re saying, but I was thinking of a case where there is zero probability of having overlap among all features. While that technically restores the property that you can multiply the dataset by arbitrarily large numbers, if feels a little like “cheating” and I agree with your larger point.
I guess Simpson’s paradox does always have a right answer in “stratify along all features”, it’s just that the amount of data you need increases exponentially in the number of relevant features. So I think that in the real world you can multiply the amount of data by a very, very large number and it won’t solve the problem, even though in a large enough number will.
In the real world it’s often also sort of an open question if the number of “features” is finite or not.