The time and space bounded SI is then approximately optimal in the sense that its total error compared to this efficient predictor, as measured by KL-divergence from the predictions made by P∗, will be ≤|P∗| bits summed across all data points.[7]
I think that classical Solomonoff induction gives zero posterior to any program with less than perfect prediction record? I can see why this works for Solomonoff with unbounded description length, this is solved by the DH(h) term you mention above.
But for bounded Solomonoff you have to allow some non-perfect programs to stay in the posterior, or you might be left with no candidates at all?
Is there an easy way to deal with this?
This is not a problem if the programs are giving probabilistic outputs, but that’s a different class of programs than used in classical Solomonoff induction.
I do use programs that give probabilistic outputs here. See claim 1 and the setup section.
I am fairly sure that there is a version of Solomonoff induction where the programs themselves output probabilities, and it’s equivalent in the limit to the version where programs output binary answers. I think it’s called ‘stochastic’ Solomonoff induction, or something like that.
I hadn’t actually realised how load-bearing this might be for the proof until you pointed it out though. Thank you.
I think that classical Solomonoff induction gives zero posterior to any program with less than perfect prediction record? I can see why this works for Solomonoff with unbounded description length, this is solved by the DH(h) term you mention above.
But for bounded Solomonoff you have to allow some non-perfect programs to stay in the posterior, or you might be left with no candidates at all?
Is there an easy way to deal with this?
This is not a problem if the programs are giving probabilistic outputs, but that’s a different class of programs than used in classical Solomonoff induction.
I do use programs that give probabilistic outputs here. See claim 1 and the setup section.
I am fairly sure that there is a version of Solomonoff induction where the programs themselves output probabilities, and it’s equivalent in the limit to the version where programs output binary answers. I think it’s called ‘stochastic’ Solomonoff induction, or something like that.
I hadn’t actually realised how load-bearing this might be for the proof until you pointed it out though. Thank you.