I think this is cross-validation for tests. There have been several posts on Occam’s Razor as a way to find correct theories, but this is the first I have seen on cross-validation.
In machine learning and statistics, a researcher often is trying to find a good predictor for some data and they often have some “training data” on which they can use to select the predictor from a class of potential predictors. Often one has more than one predictor that performs well on the training data so the question is how else can one choose an appropriate predictor.
One way to handle the problem is to use only a class of “simple predictors” (I’m fudging details!) and then use the best one: that’s Occam’s razor. Theorists like this approach and usually attach the word “information” to it. The other “practitioner” approach is use a bigger class of predictors where you tune some of the parameters on one part of the data and tune other parameters (often hyper-parameters if you know the jargon) on a separate part of the data. That’s the cross-validation approach.
There’s some results on the asymptotic equivalence of the two approaches. But, what’s cool about this post is that I think it offers a way to apply cross-validation to an area where I have never heard it discussed (I think, in part, because its the method of the practitioner and not so much the theorist—there are exceptions of course!)
I think this is cross-validation for tests. There have been several posts on Occam’s Razor as a way to find correct theories, but this is the first I have seen on cross-validation.
In machine learning and statistics, a researcher often is trying to find a good predictor for some data and they often have some “training data” on which they can use to select the predictor from a class of potential predictors. Often one has more than one predictor that performs well on the training data so the question is how else can one choose an appropriate predictor.
One way to handle the problem is to use only a class of “simple predictors” (I’m fudging details!) and then use the best one: that’s Occam’s razor. Theorists like this approach and usually attach the word “information” to it. The other “practitioner” approach is use a bigger class of predictors where you tune some of the parameters on one part of the data and tune other parameters (often hyper-parameters if you know the jargon) on a separate part of the data. That’s the cross-validation approach.
There’s some results on the asymptotic equivalence of the two approaches. But, what’s cool about this post is that I think it offers a way to apply cross-validation to an area where I have never heard it discussed (I think, in part, because its the method of the practitioner and not so much the theorist—there are exceptions of course!)