That might be useful in some context, yes. But you’ll still have both the program which is longer but gives probability of 1 to observations so far, and a program that is shorter, but gives a much much smaller probability (on the order of 2^-L where L is length difference).
I think the hypotheses in physics correspond not to individual programs in SI, but to common prefixes. For example, QM would describe a program that would convert subsequent random bits into predicted observations with correct probability distribution. You could treat a prefix as a probability distribution over outputs—the probability distribution that results from pre-pending that prefix to random bits. The bonus is that very simple operations on the random bits correspond to high level mathematical functions that are otherwise difficult to write in any low level TM-like representation.
Yeah, I agree, there’s this inconvenient competition between shorter programs that encode a probability distribution and longer ones that “hardcode” the random bits.
Here’s a random idea, I’m not sure if it works. Formulating SI as a mixture of semimeasures doesn’t require any particular weighting of these semimeasures, you just need all weights to be nonzero. We can try making the weights of longer programs decrease faster than 2^-L, so the shorter programs outcompete the longer ones.
You’ll hit the opposite problem, I think: the winning program can itself implement some approximation to SI. Would be bad predictions wise, too—the dominating hypotheses would tend to have a doomsday (because when you approximate SI you have to set a limit on number of steps). And if you can only learn that the doomsday hypothesis was wrong after the doomsday did not happen… it’s not good.
I don’t think the distinction between ‘data’ and ‘program’ is fundamental in any way. In any universe where a Turing machine (or a bounded TM) can exist, the data is a program too.
In general, I agree that reasoning about approximations to SI is trickier than reasoning about SI itself. But I don’t see how we can confidently say things like “any method of approximating SI will have such-and-such problem”. Such statements seem to be caused by limitations of our imagination, not limitations of SI. I propose to wrap up this discussion and agree to disagree, at least until we come up with some hard math about the limitations of SI.
I were originally speaking of vanilla SI, not of some not yet defined methods.
And I don’t think it’s even a limitation of SI. It just seems to me that the code vs data distinction is, in the case of machines that can run interpreters * , entirely arbitrary, subjective, and a matter of personal taste.
and not only can run interpreters, but ultimately should, if we are to represent a world that contains computers.
That might be useful in some context, yes. But you’ll still have both the program which is longer but gives probability of 1 to observations so far, and a program that is shorter, but gives a much much smaller probability (on the order of 2^-L where L is length difference).
I think the hypotheses in physics correspond not to individual programs in SI, but to common prefixes. For example, QM would describe a program that would convert subsequent random bits into predicted observations with correct probability distribution. You could treat a prefix as a probability distribution over outputs—the probability distribution that results from pre-pending that prefix to random bits. The bonus is that very simple operations on the random bits correspond to high level mathematical functions that are otherwise difficult to write in any low level TM-like representation.
Yeah, I agree, there’s this inconvenient competition between shorter programs that encode a probability distribution and longer ones that “hardcode” the random bits.
Here’s a random idea, I’m not sure if it works. Formulating SI as a mixture of semimeasures doesn’t require any particular weighting of these semimeasures, you just need all weights to be nonzero. We can try making the weights of longer programs decrease faster than 2^-L, so the shorter programs outcompete the longer ones.
You’ll hit the opposite problem, I think: the winning program can itself implement some approximation to SI. Would be bad predictions wise, too—the dominating hypotheses would tend to have a doomsday (because when you approximate SI you have to set a limit on number of steps). And if you can only learn that the doomsday hypothesis was wrong after the doomsday did not happen… it’s not good.
I don’t think the distinction between ‘data’ and ‘program’ is fundamental in any way. In any universe where a Turing machine (or a bounded TM) can exist, the data is a program too.
In general, I agree that reasoning about approximations to SI is trickier than reasoning about SI itself. But I don’t see how we can confidently say things like “any method of approximating SI will have such-and-such problem”. Such statements seem to be caused by limitations of our imagination, not limitations of SI. I propose to wrap up this discussion and agree to disagree, at least until we come up with some hard math about the limitations of SI.
I were originally speaking of vanilla SI, not of some not yet defined methods.
And I don’t think it’s even a limitation of SI. It just seems to me that the code vs data distinction is, in the case of machines that can run interpreters * , entirely arbitrary, subjective, and a matter of personal taste.
and not only can run interpreters, but ultimately should, if we are to represent a world that contains computers.