There is a formal theory describing how to balance model complexity against fit to data: describe the model using a program on a simple, fixed turing machine, and then penalize that model by assigning a prior probability to it of 2^-L, where L is the length of the program…
If I understand the maths right the important part of http://www.hutter1.net/ai/paixi.ps for using kolmogorov complexity is the part of section 2 that says
“The SPΘμ system is best in the sense that EnΘμ ≤ Enρ for any ρ.”
That doesn’t guarantee that this EnΘμ = Enρ for large numbers of different ρ. isn’t true which would invalidate any claims of it being the one right way of doing things.
I was interested in links to papers with that theorem disproved.
There is a formal theory describing how to balance model complexity against fit to data: describe the model using a program on a simple, fixed turing machine, and then penalize that model by assigning a prior probability to it of 2^-L, where L is the length of the program…
this has all been worked out.
Which Turing machine, though?
How about picking one of these?
unpaywalled version
...using a model. (I suppose someone could argue that it’s not complex enough.)
It can’t be bettered on average, assuming that the thing you are modelling is computable.
But I haven’t seen any proof to say that any other strategy will do worse on average. Anyone got any links?
See Hutter, Legg.
If I understand the maths right the important part of http://www.hutter1.net/ai/paixi.ps for using kolmogorov complexity is the part of section 2 that says
“The SPΘμ system is best in the sense that EnΘμ ≤ Enρ for any ρ.”
That doesn’t guarantee that this EnΘμ = Enρ for large numbers of different ρ. isn’t true which would invalidate any claims of it being the one right way of doing things.
I was interested in links to papers with that theorem disproved.