private_messaging comments on Understanding and justifying Solomonoff induction

private_messaging 15 Jan 2014 23:03 UTC
0 points
You can say that the input tape (where the program gets loaded from) is an infinite sequence of random bits. A ‘program of length l’ is a piece of code which sets the machine up so that after l bits subsequent bits do not matter. Every short program is also a set of longer programs.
- Anatoly_Vorobey 15 Jan 2014 23:36 UTC
  1 point
  Parent
  You can. But is there any reason to think that this models well the inherent complexity of programs? Do we ever execute programs by choosing randomly bits that constitute them? We have algorithms that utilize randomness, to be sure, but UTMs are not, normally, such algorithms. I appreciate that “choosing program bits randomly” is a simple, short, succinct way of getting to 2^-length(p), but I don’t see a reason to think it’s natural in any interesting sense.
  - private_messaging 16 Jan 2014 16:32 UTC
    2 points
    Parent
    Well, indeed. There’s nothing natural about that whole silly notion. Laws of physics in our universe are highly symmetric (and fundamental laws are even time-reversible, when mirrored and charge-flipped), what works for guessing the laws of physics is assumptions of symmetries, something that S.I. most dramatically does not do.
    
    Furthermore, incorrect theories in S.I. can take very long time to rule out because it is very easy to set up—in very few bits—a long-running busy beaver that changes the output after many bits are outputted. For an S.I. driven agent, there’s various arbitrary doomsdays looming with probabilities greater than 1/1024 (extra length less than 10 bits) .
    
    Furthermore, in total absence of external input, an ideal inductor must assume it is not within an universe where the laws of physics do not support universal computation. This rules out vast majority of the input tapes, which are given non-zero prior in S.I.
    - cousin_it 20 Jan 2014 14:07 UTC
      0 points
      Parent
      
      Furthermore, incorrect theories in S.I. can take very long time to rule out because it is very easy to set up—in very few bits—a long-running busy beaver that changes the output after many bits are outputted. For an S.I. driven agent, there’s various arbitrary doomsdays looming with probabilities greater than 1/1024 (extra length less than 10 bits).
      
      Do you think a better agent should assign low probabilities to such arbitrary doomsdays in the first place? What would be a good general rule for that?
      - private_messaging 20 Jan 2014 18:41 UTC
        0 points
        Parent
        Having thought about it some more, I feel that a good induction method would start with p_doomsday(time) being a smooth function, and it would have to acquire great many bits of information-theoretic data before that function would grow very sharp and specific peaks.
        
        Meanwhile, S.I. starts with an enormous set of weird preconceptions due to the use of some one dimensional Turing machine, and consequently produces really really bad priors. The badness of said priors is somewhat masked by very-easy-for-laymen-to-misinterpret optimality proofs.
      - private_messaging 20 Jan 2014 14:30 UTC
        0 points
        Parent
        I think that if you plot sane agent’s probability of doomsday by time, it won’t be some really weird shaped curve with incredibly sharp (Planck-time sharp) peaks at various points highly specific to the internal details of the agent. Really, it’s as if someone with a weird number-fear synasthesia looked at the list of years, since the alleged birth of Christ and picked the scariest numbers, then prophesied doomsday at those years. It is clearly completely insane.
  - cousin_it 16 Jan 2014 1:15 UTC
    0 points
    Parent
    Just checking—you do know the formulation of SI that uses a universal prefix Turing machine provided with fair coinflips on the input tape, right?
    - Anatoly_Vorobey 16 Jan 2014 1:27 UTC
      0 points
      Parent
      I don’t understand the question. I thought that’s what we were talking about. Am I missing something?
      
      To be more explicit: setting up a UTM with random bits on the input tape is a natural-seeming way of getting the probability distribution over programs (2^-length(p)) that goes into the Solomonoff prior. But as I’m trying to say in the comment you replied to, I don’t think it’s really natural at all. And of course SI doesn’t need this particular distribution in order to be effective at its job.
      - cousin_it 16 Jan 2014 2:24 UTC
        0 points
        Parent
        Yeah, sorry. It’s me who was missing something =)