cousin_it comments on The prior of a hypothesis does not depend on its complexity

cousin_it 27 Aug 2010 0:19 UTC
2 points

By complexity I mean program length.

Then why did you mention things like length(“A and B”) in your first comment? You can’t take the conjunction of two programs, can you? I’m confused.
- taw 27 Aug 2010 1:05 UTC
  3 points
  Parent
  I shouldn’t be replying that late at night, as my clarity suffer a lot, and this subject demands a lot of clarity.
  
  There are meaningful ways to describe Kolmogorov complexity of world-testing programs and their combinations, and there is meaningful equivalent of conditional probability. But you don’t use them in a prior directly. When you want to define prior with world-testing programs you need to do a moral equivalent of padding every program with enough entropy to fully describe the world. So program “numbers are 2, 4, 6” converts without entropy padding, program “numbers are even” needs a lot of extra entropy, and “numbers are whatever Jesus wants them to be” requires as much entropy as an empty program.
  
  My excuse is that when you sum over every possible padding, as in (1/2)^length(“A and B” + padding bits 0000) + (1/2)^length(“A and B” + padding bits 0001) + … what you get will be simply (1/2)^length(“A and B”) times a small language-dependent constant for (1/2)^length(program to convert entropy bits and world-testing program into world-generating program).
  
  So you have:
  - Length of world-testing program
  - Entropy needed to get world-generating program from world-testing program
  - Probability that world-testing program matches the world
  For complete descriptions (no extra entropy needed) of world that don’t have any shorter descriptions, complexity of description is very closely related to Kolmogorov complexity—“sequence is 50 times 42” is shorter than “sequence is 9, 0, 2, 9, 3, 2, 7, 5, 8, 1, 4, 9, 8, 7, 1, 0, 6, 2, 3, 4, 4, 8, 5, 7, 4, 4, 7, 0, 1, 2, 8, 3, 9, 0, 9, 0, 3, 2, 8, 4, 6, 0, 4, 7, 8, 3, 4, 2, 3, 6, 3″, both are unambiguous and about maximally concise (I generated the second randomly) - so first is more likely than second.
  
  For incomplete descriptions it’s more complicated, but mathematics should be straightforward enough.
  - cousin_it 27 Aug 2010 9:04 UTC
    2 points
    Parent
    Thanks, this angle looks interesting, and new to me. Could you explain some more how it’s supposed to work? For example, if the input is an infinite stream of numbers and my hypothesis is something like “next number will be 42” or “prime-numbered members will be prime”, should I pad it with infinite additional entropy?
    - taw 27 Aug 2010 23:28 UTC
      0 points
      Parent
      Normally streams are assumed to be finitely long and have symbols from finite set to keep math neat.
      
      Extension to infinitely long streams with finite complexity are conceptually fairly straightforward using trick very similar to non-normalizable Bayesian prior (just like uniform prior over real numbers is well defined even if it sums to infinity so it’s not really probability, or entropy of a molecule when atom locations are unbound continuous variables), but math can get considerably messier.
      
      Finite or not, streams must be countable.
      
      So just select some ordering and for stream-tester T and number N world generator is “i=0; for(w in streams) { if(T(w)) i += 1; if(i == N) { print(w); break; } }”. There are multiple ways to choose iteration order and encoding for N (simplest would be fixed bit size number, but you need more complicated encoding for infinite number of possibilities obviously), but they’re just a constant away from each other, so you can ignore this detail.