First: the hypotheses/programs (recall: we are representing hypotheses by computer programs in some suitable language) need to be written in a “prefix-free” representation, meaning one in which e.g. if 100111 is a valid program then nothing else beginning 100111 can be. So you don’t get two programs of length 1, four of length 2, etc. If you do this, and if you also arrange your coding so that every infinite string of bits starts with some legal program, then the sum over all legal programs of 2^-length is exactly 1. (So Pr(program) = 2^-length(program) defines a probability distribution over programs. It’s the same probability distribution as you get from the following procedure: keep generating random bits until the bits you’ve generated form a legal program.)
Second: you can’t just take Ω=1 “for simplicity”; Ω is defined to be what you get by adding up 2^-length for all programs that halt. (Remember that hypotheses are being represented as computer programs here.) Equivalently, if you imagine picking a random well-formed program by choosing random bits, Ω is the probability that the resulting program halts. So it’s certainly <1.
[EDITED to fix an ugly but hopefully not confusing typo.]
I think you may be missing two things.
First: the hypotheses/programs (recall: we are representing hypotheses by computer programs in some suitable language) need to be written in a “prefix-free” representation, meaning one in which e.g. if 100111 is a valid program then nothing else beginning 100111 can be. So you don’t get two programs of length 1, four of length 2, etc. If you do this, and if you also arrange your coding so that every infinite string of bits starts with some legal program, then the sum over all legal programs of 2^-length is exactly 1. (So Pr(program) = 2^-length(program) defines a probability distribution over programs. It’s the same probability distribution as you get from the following procedure: keep generating random bits until the bits you’ve generated form a legal program.)
Second: you can’t just take Ω=1 “for simplicity”; Ω is defined to be what you get by adding up 2^-length for all programs that halt. (Remember that hypotheses are being represented as computer programs here.) Equivalently, if you imagine picking a random well-formed program by choosing random bits, Ω is the probability that the resulting program halts. So it’s certainly <1.
[EDITED to fix an ugly but hopefully not confusing typo.]