This is only true as long as difference between simple and complex models are small, and only because simple model avoids overfitting problem. For many orders of magnitude failures choosing a simple and wrong over complex and right is not very intelligent.
There is a formal theory describing how to balance model complexity against fit to data: describe the model using a program on a simple, fixed turing machine, and then penalize that model by assigning a prior probability to it of 2^-L, where L is the length of the program…
If I understand the maths right the important part of http://www.hutter1.net/ai/paixi.ps for using kolmogorov complexity is the part of section 2 that says
“The SPΘμ system is best in the sense that EnΘμ ≤ Enρ for any ρ.”
That doesn’t guarantee that this EnΘμ = Enρ for large numbers of different ρ. isn’t true which would invalidate any claims of it being the one right way of doing things.
I was interested in links to papers with that theorem disproved.
This is only true as long as difference between simple and complex models are small, and only because simple model avoids overfitting problem. For many orders of magnitude failures choosing a simple and wrong over complex and right is not very intelligent.
There is a formal theory describing how to balance model complexity against fit to data: describe the model using a program on a simple, fixed turing machine, and then penalize that model by assigning a prior probability to it of 2^-L, where L is the length of the program…
this has all been worked out.
Which Turing machine, though?
How about picking one of these?
unpaywalled version
...using a model. (I suppose someone could argue that it’s not complex enough.)
It can’t be bettered on average, assuming that the thing you are modelling is computable.
But I haven’t seen any proof to say that any other strategy will do worse on average. Anyone got any links?
See Hutter, Legg.
If I understand the maths right the important part of http://www.hutter1.net/ai/paixi.ps for using kolmogorov complexity is the part of section 2 that says
“The SPΘμ system is best in the sense that EnΘμ ≤ Enρ for any ρ.”
That doesn’t guarantee that this EnΘμ = Enρ for large numbers of different ρ. isn’t true which would invalidate any claims of it being the one right way of doing things.
I was interested in links to papers with that theorem disproved.