One of biases that are extremely prevalent in science, but are rarely talked about anywhere, is bias towards models that are mathematically simple and easier to operate on.
This is only true as long as difference between simple and complex models are small, and only because simple model avoids overfitting problem. For many orders of magnitude failures choosing a simple and wrong over complex and right is not very intelligent.
There is a formal theory describing how to balance model complexity against fit to data: describe the model using a program on a simple, fixed turing machine, and then penalize that model by assigning a prior probability to it of 2^-L, where L is the length of the program…
If I understand the maths right the important part of http://www.hutter1.net/ai/paixi.ps for using kolmogorov complexity is the part of section 2 that says
“The SPΘμ system is best in the sense that EnΘμ ≤ Enρ for any ρ.”
That doesn’t guarantee that this EnΘμ = Enρ for large numbers of different ρ. isn’t true which would invalidate any claims of it being the one right way of doing things.
I was interested in links to papers with that theorem disproved.
I think that this is a heuristic rather than a bias, because favoring simple models over complex ones is generally a good thing. In particular, the complexity prior is claimed by some to be a fundemental principle of intelligence.
This is only true as long as difference between simple and complex models are small, and only because simple model avoids overfitting problem. For many orders of magnitude failures choosing a simple and wrong over complex and right is not very intelligent.
There is a formal theory describing how to balance model complexity against fit to data: describe the model using a program on a simple, fixed turing machine, and then penalize that model by assigning a prior probability to it of 2^-L, where L is the length of the program…
this has all been worked out.
Which Turing machine, though?
How about picking one of these?
unpaywalled version
...using a model. (I suppose someone could argue that it’s not complex enough.)
It can’t be bettered on average, assuming that the thing you are modelling is computable.
But I haven’t seen any proof to say that any other strategy will do worse on average. Anyone got any links?
See Hutter, Legg.
If I understand the maths right the important part of http://www.hutter1.net/ai/paixi.ps for using kolmogorov complexity is the part of section 2 that says
“The SPΘμ system is best in the sense that EnΘμ ≤ Enρ for any ρ.”
That doesn’t guarantee that this EnΘμ = Enρ for large numbers of different ρ. isn’t true which would invalidate any claims of it being the one right way of doing things.
I was interested in links to papers with that theorem disproved.