The thing to prove is precisely convergence and calibration!
Convergence: Given arbitrary observed sequences, the probabilities of the latent bits (program tape and working tape) converge.
(Obviously the non-latent bits converge, if we assume they eventually get observed; hence the focus on the latent bits here.)
Calibration: Considering the sequence of probabilistic predictions has the property that, considering the subsequence of only predictions between 79% and 81% (for example), the empirical frequency of those predictions coming true will eventually be in that range and stay within that range forever. (This should be true for all such ranges.
Of course, there will also be a number of interesting formal variations on these properties, either weakening them (EG giving conditions where we do get a guarantee) or strengthening them (EG giving explicit bounds on how badly these properties can be violated, rather than only proving that they are obeyed in the limit). For example, converging to predict 50% on adversarial sequences is a much weaker form of a calibration guarantee.
The thing to prove is precisely convergence and calibration!
Convergence: Given arbitrary observed sequences, the probabilities of the latent bits (program tape and working tape) converge.
(Obviously the non-latent bits converge, if we assume they eventually get observed; hence the focus on the latent bits here.)
Calibration: Considering the sequence of probabilistic predictions has the property that, considering the subsequence of only predictions between 79% and 81% (for example), the empirical frequency of those predictions coming true will eventually be in that range and stay within that range forever. (This should be true for all such ranges.
Of course, there will also be a number of interesting formal variations on these properties, either weakening them (EG giving conditions where we do get a guarantee) or strengthening them (EG giving explicit bounds on how badly these properties can be violated, rather than only proving that they are obeyed in the limit). For example, converging to predict 50% on adversarial sequences is a much weaker form of a calibration guarantee.