I am not sure exactly why, but this and the optimization post both call to mind the current of thought suggesting we segregate the hypothesis and experimental steps explicitly. I have encountered this in three places:
An unfinished textbook on Arxiv (which I cannot now locate to my frustration) that described treating machine learning as a science, which proposed gathering data and then the goodness of machine learning algorithm is measured by compression.
This is basically how astronomy works by default: no one has a hypothesis for how pulsars interact and then gets a grant from their university department to launch a satellite network to look for pulsars; instead they identify phenomena on which they have little data, and pool resources to build a telescope or satellite or underground neutrino detector to gather the data, and then the publications test their hypotheses against the data gathered from one or more such projects.
I have a vague intuition that dividing up scientific practice in this way chunks the dimensions more tractably, or at least allows for it. Allowing optimization of data gathering and hypothesis formulation independently seems like a clear win for similar reasons.
Maybe the appeal is that it allows hypotheses to come from multiple directions in dimension space. The dimensionality of a body of data is fixed, but if it is generated as a tuple with a single hypothesis then it can only be approached from the perspective of that single hypothesis; if it is independent, then any hypothesis concerned with any of the dimensions of the data can be applied. By analogy, consider convergent evolution: two different paths in phase space arrive at essentially the same thing. Segregating the data step radically compresses this by allowing hypotheses from any other chain of development to be tested against it directly.
In particular, the view in this post is extremely similar to the view in Macroscopic Prediction. As there, reproducible phenomena are the key puzzle piece.
I detect the ghost of Jaynes in this!
I am not sure exactly why, but this and the optimization post both call to mind the current of thought suggesting we segregate the hypothesis and experimental steps explicitly. I have encountered this in three places:
An unfinished textbook on Arxiv (which I cannot now locate to my frustration) that described treating machine learning as a science, which proposed gathering data and then the goodness of machine learning algorithm is measured by compression.
The Report likelihoods, not p-values article on Arbital.
This is basically how astronomy works by default: no one has a hypothesis for how pulsars interact and then gets a grant from their university department to launch a satellite network to look for pulsars; instead they identify phenomena on which they have little data, and pool resources to build a telescope or satellite or underground neutrino detector to gather the data, and then the publications test their hypotheses against the data gathered from one or more such projects.
I have a vague intuition that dividing up scientific practice in this way chunks the dimensions more tractably, or at least allows for it. Allowing optimization of data gathering and hypothesis formulation independently seems like a clear win for similar reasons.
Maybe the appeal is that it allows hypotheses to come from multiple directions in dimension space. The dimensionality of a body of data is fixed, but if it is generated as a tuple with a single hypothesis then it can only be approached from the perspective of that single hypothesis; if it is independent, then any hypothesis concerned with any of the dimensions of the data can be applied. By analogy, consider convergent evolution: two different paths in phase space arrive at essentially the same thing. Segregating the data step radically compresses this by allowing hypotheses from any other chain of development to be tested against it directly.
In particular, the view in this post is extremely similar to the view in Macroscopic Prediction. As there, reproducible phenomena are the key puzzle piece.