There are different ways to formalize the same thing. One of the ways seems less confusing than others.
Suppose you have an universal Turing machine with 1 input tape, 1 work tape, and 1 output tape. The machine reads from the input (which works monotonously, i.e. like cin in c++), prints something on the output (also monotonous, like cout in c++) and works on the work tape (initialized with zeroes). The input tape can be set up as to load a program onto the work tape, including an emulator for any other such machine.
The prior probability of observing a sequence of bits x, is the probability that it gets printed when the input tape is an infinite sequence of random bits. Just that. (The probability that input sequence starts with n specific bits is 2^-n )
When we want to find a probability of future observations y after observing x, we look for the probability that given that x is outputted the y is outputted immediately after x.
As for the correspondence between input strings and theories, it’s not very straightforward.
For example, codes which include junk bits the value of which does not affect the output are longer than those which do not include any junk bits. Yet, since the junk bits can have any value, it is not correspondingly unlikely that one of such codes is encountered. Simply put, a theory like f=m a takes form of an immense number of inane things like f=m a (0.1+0.9)+0 (0.2365912354+pi) …. , each inane thing individually being improbable but the sum of probabilities of such inane theories being not small.
There are different ways to formalize the same thing. One of the ways seems less confusing than others.
Suppose you have an universal Turing machine with 1 input tape, 1 work tape, and 1 output tape. The machine reads from the input (which works monotonously, i.e. like cin in c++), prints something on the output (also monotonous, like cout in c++) and works on the work tape (initialized with zeroes). The input tape can be set up as to load a program onto the work tape, including an emulator for any other such machine.
The prior probability of observing a sequence of bits x, is the probability that it gets printed when the input tape is an infinite sequence of random bits. Just that. (The probability that input sequence starts with n specific bits is 2^-n )
When we want to find a probability of future observations y after observing x, we look for the probability that given that x is outputted the y is outputted immediately after x.
As for the correspondence between input strings and theories, it’s not very straightforward.
For example, codes which include junk bits the value of which does not affect the output are longer than those which do not include any junk bits. Yet, since the junk bits can have any value, it is not correspondingly unlikely that one of such codes is encountered. Simply put, a theory like f=m a takes form of an immense number of inane things like f=m a (0.1+0.9)+0 (0.2365912354+pi) …. , each inane thing individually being improbable but the sum of probabilities of such inane theories being not small.