Yes, I think that was better, because the ground truth is Kepler’s third law and jimrandomh pointed out your method actually recaptures a (badly obfuscated and possibly overfit) variant of it.
Imagine that you have an high-dimensional predictor, of which one dimension completely determines the outcome and the rest are noise. Your shortest possible generating algorithm is going to have to pick out the relevant dimension. So as the dimensionality of the predictor increases, the algorithm length will necessarily increase, just for information-theoretic reasons.
recaptures a (badly obfuscated and possibly overfit) variant of it.
How do you overfit Kepler’s law?
edit: Retracted. I see now looking at the actual link the result wasn’t just obfuscated but wrong, and so the manner in which it’s wrong can overfit of course (and that matches the results).
To the extent that Kepler’s laws are exact only for two-body systems of point masses (so I guess calling Kepler’s third law the ground truth is a bit problematic) and to the extent that the data are imperfectly observed, there are residuals that over-eager models can try to match.
Edit: More generally, you don’t overfit the underlying law, you overfit noisy data generated by the underlying law.
Dimensions irrelevant for the output, will fall out. Regardless if they are random or not. If they somehow (anyhow) contribute, their influence will remain in the evolved algorithm.
The simplest algorithm in the Kolmogorov’s sense is the best you can hope for.
Yes, I think that was better, because the ground truth is Kepler’s third law and jimrandomh pointed out your method actually recaptures a (badly obfuscated and possibly overfit) variant of it.
“Dimensionality” is totally relevant in any approach to supervised learning. But it matters even without considering the bias/variance trade-off, etc.
Imagine that you have an high-dimensional predictor, of which one dimension completely determines the outcome and the rest are noise. Your shortest possible generating algorithm is going to have to pick out the relevant dimension. So as the dimensionality of the predictor increases, the algorithm length will necessarily increase, just for information-theoretic reasons.
How do you overfit Kepler’s law?
edit: Retracted. I see now looking at the actual link the result wasn’t just obfuscated but wrong, and so the manner in which it’s wrong can overfit of course (and that matches the results).
To the extent that Kepler’s laws are exact only for two-body systems of point masses (so I guess calling Kepler’s third law the ground truth is a bit problematic) and to the extent that the data are imperfectly observed, there are residuals that over-eager models can try to match.
Edit: More generally, you don’t overfit the underlying law, you overfit noisy data generated by the underlying law.
Kepler’s law holds well. The influences of other planets are negligible for the precision we dealt with.
Dimensions irrelevant for the output, will fall out. Regardless if they are random or not. If they somehow (anyhow) contribute, their influence will remain in the evolved algorithm.
The simplest algorithm in the Kolmogorov’s sense is the best you can hope for.