We don’t know what would be the shortest way to encode a very good approximation to relativity either
The idea is that if humans can come up with approximation of relativity which are good enough for the purpose of predicting their observations, in principle SI can do it too.
The issue is prior probability: since humans use a different prior than SI, it’s not straightforward that SI will not favor shorter models that in practice may perform worse. There are universality theorems which essentially prove that given enough observations, SI will eventually catch up with any semi-computable learner, but the number of observation for this to happen might be far from practical.
For instance, there is a theorem which proves that, for any algorithm, if you sample problem instances according to a Solomonoff distribution, then average case complexity will asymptotically match worst case complexity. If the Solomonoff distribution was a reasonable prior for practical purposes, then we should observe that for all algorithms, for realistic instance distributions, average case complexity was about the same order of magnitude as worst case complexity. Empirically, we observe that this is not necessarily the case, the Simplex algorithm for linear programming, for instance, has exponential time worst case complexity but is usually very efficient (polynomial time) on typical inputs.
The idea is that if humans can come up with approximation of relativity which are good enough for the purpose of predicting their observations, in principle SI can do it too.
The issue is prior probability: since humans use a different prior than SI, it’s not straightforward that SI will not favor shorter models that in practice may perform worse.
There are universality theorems which essentially prove that given enough observations, SI will eventually catch up with any semi-computable learner, but the number of observation for this to happen might be far from practical.
For instance, there is a theorem which proves that, for any algorithm, if you sample problem instances according to a Solomonoff distribution, then average case complexity will asymptotically match worst case complexity.
If the Solomonoff distribution was a reasonable prior for practical purposes, then we should observe that for all algorithms, for realistic instance distributions, average case complexity was about the same order of magnitude as worst case complexity. Empirically, we observe that this is not necessarily the case, the Simplex algorithm for linear programming, for instance, has exponential time worst case complexity but is usually very efficient (polynomial time) on typical inputs.