Releasing the data in dribs and drabs doesn’t address this either.
It does force researchers into an ad hoc cross-validation scheme, doesn’t it?
There’s a difference between, on the one hand, having the data freely available and being intelligent enough to use cross-validation, and on the other, having someone paternalistically hold back the data from you.
If you start from the premise that researchers may fall into the overfitting trap, then you’re already treating them adversarily. And if just one researcher overfitting a theory and so becoming irrefutable will screw everything up, then the paranoid approach to data release prevents that total cockup (at the cost of some interim inefficiencies, by hindering the responsible, good, researchers).
And if just one researcher overfitting a theory and so becoming irrefutable will screw everything up, then the paranoid approach to data release prevents that total cockup
It doesn’t prevent that entirely reliably, either. How much time are you going to give the researchers to come up with hypotheses before you release the full set? And what do you do if someone comes up with a new hypothesis after the full release, so mindblowingly elegant and simple that it blows all of the previously published ones out of the water?
If you think that some later hypotheses based on the full set might still be accepted, then you’re still vulnerable to falling into the overfitting trap after the full release. If you don’t, then you’ll be locked forever into the theories scientists came up with during the partial-release window, and no later advances in the scientific method, rationality, computing, math or even the intelligence of researchers will allow you to improve upon them.
This approach might get you some extra empirical evidence, but it will be empirical evidence about theories put together under quite limited conditions, compared to what will be available to later civilization.
It does force researchers into an ad hoc cross-validation scheme, doesn’t it?
If you start from the premise that researchers may fall into the overfitting trap, then you’re already treating them adversarily. And if just one researcher overfitting a theory and so becoming irrefutable will screw everything up, then the paranoid approach to data release prevents that total cockup (at the cost of some interim inefficiencies, by hindering the responsible, good, researchers).
It doesn’t prevent that entirely reliably, either. How much time are you going to give the researchers to come up with hypotheses before you release the full set? And what do you do if someone comes up with a new hypothesis after the full release, so mindblowingly elegant and simple that it blows all of the previously published ones out of the water?
If you think that some later hypotheses based on the full set might still be accepted, then you’re still vulnerable to falling into the overfitting trap after the full release. If you don’t, then you’ll be locked forever into the theories scientists came up with during the partial-release window, and no later advances in the scientific method, rationality, computing, math or even the intelligence of researchers will allow you to improve upon them.
This approach might get you some extra empirical evidence, but it will be empirical evidence about theories put together under quite limited conditions, compared to what will be available to later civilization.
I’d rather wait for researchers to screw up and then hammer them.