I would beware the opinions of individual people on this, as I don’t believe it’s a very settled question. For instance, my favorite textbook author, Prof. Frank Harrell, thinks 22k is “just barely large enough to do split-sample validation.” The adequacy of leave-one-out versus 10-fold depends on your available computational power as well as your sample size. 200 seems certainly not enough to hold out 30% as a test set; there’s way too much variance.
On thinking about this more, I suppose the LOO/k-fold/split-sample question should depend a lot on a bunch of factors relating to how much signal/noise you expect. In the case you link to, they’re looking at behavioural health, which is far from deterministic, where events like heart attacks only occur in <5% of the population that you’re studying. And then the question-asker is trying to tease out differences that may be quite subtle between the performance of SVM, logistic regression, et cetera.
I would beware the opinions of individual people on this, as I don’t believe it’s a very settled question. For instance, my favorite textbook author, Prof. Frank Harrell, thinks 22k is “just barely large enough to do split-sample validation.” The adequacy of leave-one-out versus 10-fold depends on your available computational power as well as your sample size. 200 seems certainly not enough to hold out 30% as a test set; there’s way too much variance.
That’s interesting, and a useful update.
On thinking about this more, I suppose the LOO/k-fold/split-sample question should depend a lot on a bunch of factors relating to how much signal/noise you expect. In the case you link to, they’re looking at behavioural health, which is far from deterministic, where events like heart attacks only occur in <5% of the population that you’re studying. And then the question-asker is trying to tease out differences that may be quite subtle between the performance of SVM, logistic regression, et cetera.
also depends on the number of features in the model, their distribution, the distribution of the target variable, etc.