I’m not sure what you’re asking here. The test set should of course be drawn from the same distribution as the future cases you actually care about. In practice, it can sometimes be hard to ensure that. But judging by performance on an arbitrary data set isn’t an option, since performance in the future does depend on what data shows up in the future (for a classification problem, on both the inputs, and of course on the class labels). I think I’m missing what you’re getting at....
I’m not sure what you’re asking here. The test set should of course be drawn from the same distribution as the future cases you actually care about. In practice, it can sometimes be hard to ensure that. But judging by performance on an arbitrary data set isn’t an option, since performance in the future does depend on what data shows up in the future (for a classification problem, on both the inputs, and of course on the class labels). I think I’m missing what you’re getting at....