Good one. I think maybe that’s true for some domains but not others?
Another way to consider this is that there are a small number of low-hanging fruits that yield a lot of improvement. You could even call them “beginner gains” if they are easy. But after that, you have to deal with a long tail of modest improvements—yet there’s no doubt in my mind that correctly stacking them can yield some more improvement.
Also sorry I didn’t actually answer your main question. It’s actually something I’ve thought about quite a bit, but usually in the context of “not enough data to map out this very-high-dimensional space” rather than “not enough data to detect a small change”. The problem is similar in both cases. I’ll probably write a post or two on it at some point, but here’s a very short summary.
Traditional probability theory relies heavily on large-number approximations; mainstream statistics uses convergence as its main criterion of validity. Small data problems, on the other hand, are much better suited to a Bayesian approach. In particular, if we have a few different models (call them Mi) and some data D, we can compute the posterior P[Mi|D] without having to talk about convergence or large numbers at all.
The trade-off is that the math tends to be spectacularly hairy; P[Mi|D] usually involves high-dimensional integrals. Traditional approaches approximate those integrals for large numbers of data points, but the whole point here is that we don’t have enough data for the approximations to be valid.
Good one. I think maybe that’s true for some domains but not others?
Another way to consider this is that there are a small number of low-hanging fruits that yield a lot of improvement. You could even call them “beginner gains” if they are easy. But after that, you have to deal with a long tail of modest improvements—yet there’s no doubt in my mind that correctly stacking them can yield some more improvement.
Also sorry I didn’t actually answer your main question. It’s actually something I’ve thought about quite a bit, but usually in the context of “not enough data to map out this very-high-dimensional space” rather than “not enough data to detect a small change”. The problem is similar in both cases. I’ll probably write a post or two on it at some point, but here’s a very short summary.
Traditional probability theory relies heavily on large-number approximations; mainstream statistics uses convergence as its main criterion of validity. Small data problems, on the other hand, are much better suited to a Bayesian approach. In particular, if we have a few different models (call them Mi) and some data D, we can compute the posterior P[Mi|D] without having to talk about convergence or large numbers at all.
The trade-off is that the math tends to be spectacularly hairy; P[Mi|D] usually involves high-dimensional integrals. Traditional approaches approximate those integrals for large numbers of data points, but the whole point here is that we don’t have enough data for the approximations to be valid.