You can say that the input tape (where the program gets loaded from) is an infinite sequence of random bits. A ‘program of length l’ is a piece of code which sets the machine up so that after l bits subsequent bits do not matter. Every short program is also a set of longer programs.
You can. But is there any reason to think that this models well the inherent complexity of programs? Do we ever execute programs by choosing randomly bits that constitute them? We have algorithms that utilize randomness, to be sure, but UTMs are not, normally, such algorithms. I appreciate that “choosing program bits randomly” is a simple, short, succinct way of getting to 2^-length(p), but I don’t see a reason to think it’s natural in any interesting sense.
Well, indeed. There’s nothing natural about that whole silly notion. Laws of physics in our universe are highly symmetric (and fundamental laws are even time-reversible, when mirrored and charge-flipped), what works for guessing the laws of physics is assumptions of symmetries, something that S.I. most dramatically does not do.
Furthermore, incorrect theories in S.I. can take very long time to rule out because it is very easy to set up—in very few bits—a long-running busy beaver that changes the output after many bits are outputted. For an S.I. driven agent, there’s various arbitrary doomsdays looming with probabilities greater than 1/1024 (extra length less than 10 bits) .
Furthermore, in total absence of external input, an ideal inductor must assume it is not within an universe where the laws of physics do not support universal computation. This rules out vast majority of the input tapes, which are given non-zero prior in S.I.
Furthermore, incorrect theories in S.I. can take very long time to rule out because it is very easy to set up—in very few bits—a long-running busy beaver that changes the output after many bits are outputted. For an S.I. driven agent, there’s various arbitrary doomsdays looming with probabilities greater than 1/1024 (extra length less than 10 bits).
Do you think a better agent should assign low probabilities to such arbitrary doomsdays in the first place? What would be a good general rule for that?
Having thought about it some more, I feel that a good induction method would start with p_doomsday(time) being a smooth function, and it would have to acquire great many bits of information-theoretic data before that function would grow very sharp and specific peaks.
Meanwhile, S.I. starts with an enormous set of weird preconceptions due to the use of some one dimensional Turing machine, and consequently produces really really bad priors. The badness of said priors is somewhat masked by very-easy-for-laymen-to-misinterpret optimality proofs.
I think that if you plot sane agent’s probability of doomsday by time, it won’t be some really weird shaped curve with incredibly sharp (Planck-time sharp) peaks at various points highly specific to the internal details of the agent. Really, it’s as if someone with a weird number-fear synasthesia looked at the list of years, since the alleged birth of Christ and picked the scariest numbers, then prophesied doomsday at those years. It is clearly completely insane.
I don’t understand the question. I thought that’s what we were talking about. Am I missing something?
To be more explicit: setting up a UTM with random bits on the input tape is a natural-seeming way of getting the probability distribution over programs (2^-length(p)) that goes into the Solomonoff prior. But as I’m trying to say in the comment you replied to, I don’t think it’s really natural at all. And of course SI doesn’t need this particular distribution in order to be effective at its job.
You can say that the input tape (where the program gets loaded from) is an infinite sequence of random bits. A ‘program of length l’ is a piece of code which sets the machine up so that after l bits subsequent bits do not matter. Every short program is also a set of longer programs.
You can. But is there any reason to think that this models well the inherent complexity of programs? Do we ever execute programs by choosing randomly bits that constitute them? We have algorithms that utilize randomness, to be sure, but UTMs are not, normally, such algorithms. I appreciate that “choosing program bits randomly” is a simple, short, succinct way of getting to 2^-length(p), but I don’t see a reason to think it’s natural in any interesting sense.
Well, indeed. There’s nothing natural about that whole silly notion. Laws of physics in our universe are highly symmetric (and fundamental laws are even time-reversible, when mirrored and charge-flipped), what works for guessing the laws of physics is assumptions of symmetries, something that S.I. most dramatically does not do.
Furthermore, incorrect theories in S.I. can take very long time to rule out because it is very easy to set up—in very few bits—a long-running busy beaver that changes the output after many bits are outputted. For an S.I. driven agent, there’s various arbitrary doomsdays looming with probabilities greater than 1/1024 (extra length less than 10 bits) .
Furthermore, in total absence of external input, an ideal inductor must assume it is not within an universe where the laws of physics do not support universal computation. This rules out vast majority of the input tapes, which are given non-zero prior in S.I.
Do you think a better agent should assign low probabilities to such arbitrary doomsdays in the first place? What would be a good general rule for that?
Having thought about it some more, I feel that a good induction method would start with p_doomsday(time) being a smooth function, and it would have to acquire great many bits of information-theoretic data before that function would grow very sharp and specific peaks.
Meanwhile, S.I. starts with an enormous set of weird preconceptions due to the use of some one dimensional Turing machine, and consequently produces really really bad priors. The badness of said priors is somewhat masked by very-easy-for-laymen-to-misinterpret optimality proofs.
I think that if you plot sane agent’s probability of doomsday by time, it won’t be some really weird shaped curve with incredibly sharp (Planck-time sharp) peaks at various points highly specific to the internal details of the agent. Really, it’s as if someone with a weird number-fear synasthesia looked at the list of years, since the alleged birth of Christ and picked the scariest numbers, then prophesied doomsday at those years. It is clearly completely insane.
Just checking—you do know the formulation of SI that uses a universal prefix Turing machine provided with fair coinflips on the input tape, right?
I don’t understand the question. I thought that’s what we were talking about. Am I missing something?
To be more explicit: setting up a UTM with random bits on the input tape is a natural-seeming way of getting the probability distribution over programs (2^-length(p)) that goes into the Solomonoff prior. But as I’m trying to say in the comment you replied to, I don’t think it’s really natural at all. And of course SI doesn’t need this particular distribution in order to be effective at its job.
Yeah, sorry. It’s me who was missing something =)