The short answer is, you can’t. Solomonoff induction is not computable, and moreover depends on your model of computation. Asymptotically, the model of computation makes no difference, but for any finite set of examples, there is (for example) a universal Turing machine with short codes for those examples built in.
Practical methods of choosing among models use completely different methods. One collection of such methods is called regularization.
Well that explains why I was struggling to find anything online!
Thanks for the link, I’ve been going through some of the techniques.
Using AIC the penalty for each additional parameter is a factor of e. For BIC the equivalent is √n so the more samples the more penalised a complex model is. For large n the models diverge—are there principled methods for choosing which regularisation to use?
At this point I reveal that I just play a statistician on the net. I don’t know how people choose from among the many methods available. Is there a statistician in the house?
Solomonoff induction is uncomputable because it requires knowing the set of all programs that compute a given set of data. If you just have two hypotheses in front of you, “Solomonoff induction” is not quite the term, as strictly understood it is a method for extrapolating a given sequence of data, rather than choosing between two programs that would generate the data seen so far. But understanding it as referring to the general idea of assigning probabilities to programs by their length, these are still uncomputable if one considers only programs that are of minimal length in their equivalence class. And when you don’t make that requirement, the concepts of algorithmic complexity have little to say about the example.
The short answer is, you can’t. Solomonoff induction is not computable, and moreover depends on your model of computation. Asymptotically, the model of computation makes no difference, but for any finite set of examples, there is (for example) a universal Turing machine with short codes for those examples built in.
Practical methods of choosing among models use completely different methods. One collection of such methods is called regularization.
Well that explains why I was struggling to find anything online!
Thanks for the link, I’ve been going through some of the techniques.
Using AIC the penalty for each additional parameter is a factor of e. For BIC the equivalent is √n so the more samples the more penalised a complex model is. For large n the models diverge—are there principled methods for choosing which regularisation to use?
At this point I reveal that I just play a statistician on the net. I don’t know how people choose from among the many methods available. Is there a statistician in the house?
Solomonoff induction is not computable because its hypothesis space is infinite, but Bucky is only asking about a finite subset.
Solomonoff induction is uncomputable because it requires knowing the set of all programs that compute a given set of data. If you just have two hypotheses in front of you, “Solomonoff induction” is not quite the term, as strictly understood it is a method for extrapolating a given sequence of data, rather than choosing between two programs that would generate the data seen so far. But understanding it as referring to the general idea of assigning probabilities to programs by their length, these are still uncomputable if one considers only programs that are of minimal length in their equivalence class. And when you don’t make that requirement, the concepts of algorithmic complexity have little to say about the example.