I think the intuition here is basically that of the everything-list’s “white rabbit” problem. If you consider e.g. all programs at most 10^100 bits in length, there will be many more long than short programs that output a given mind. But I think the standard answer is most of those long programs will just be short programs with irrelevant junk bits tacked on?
I basically don’t understand such arguments as applied to real-world cosmology, i.e. computing programs and not discovering them. ’Cuz if we’re talking about cosmology aren’t we assuming that at some point some computation is going to occur? If so, there’s a very short program that outputs a universal dovetailer that computes all programs of arbitrary length, that repeatedly outputs a universal dovetailer for all programs at most 10^5 bits in length, that.… and it’s just not clear to me what generators win out in the end, whether short short-biased or long long-biased, how that depends on choice of language, or generally what the heck is going.
Warning! Almost assuredly blithering nonsense: (Actually, in that scenario aren’t there logical attractors for programs to output 0, 1, 10, 11, … which results in universal distribution/generator constructed from the uniform generator, which then goes on to compute whatever universe we would have seen from an original universal distribution anyway? This self-organization looks suspiciously like getting information from nowhere, but those computations must cost negentropy if they’re not reversible. If they are reversible then how? Reversible by what? Anyway that is information as seen from outside the system which might not be meaningful—information from any point inside the system seems like it might be lost with each irreversible computation? Bleh, speculations.)
(ETA: Actually couldn’t we just run some simulations of this argument or translate it into terms of Hashlife and see what we get? My hypothesis is that as we compute all programs of length x=0, x++ till infinity, the binary outputs of all computations when sorted into identical groups converge on a universal prior distribution, though for small values of x the convergence is swamped by language choice. I have no real reason to suspect this is hypothesis is accurate or even meaningful.)
(ETA 2: Bleh, forgot about the need to renormalize outputs by K complexity (i.e. maximum lossless compression) ’cuz so many programs will output “111111111111...”. Don’t even know if that’s meaningful or doesn’t undermine the entire point. Brain doesn’t like math.)
Warning! Almost assuredly blithering nonsense: Hm, is this more informative if instead we consider programs between 10^100 and 10^120 bits in length? Should it matter all that much how long they are? If we can show convergence upon characteristic output distributions by various reasonably large sets of all programs of bit lengths a to b, a < b, between 0 and infinity, then we can perhaps make some weak claims about “attractive” outputs for programs of arbitrary length. I speculated in my other comment reply to your comment that after maximally compressing all of the outputs we might get some neat distribution (whatever the equivalent of the normal distribution is for enough arbitrary program outputs in a given language after compression), though it’s probably something useless that doesn’t explain anything, like, I’m not sure that compressing the results doesn’t just destroy the entire point of getting the outputs. (Instead maybe we’d run all the outputs as programs repeatedly; side question: if you keep doing this how quickly does the algorithm weed out non-halting programs?) Chaitin would smile upon such methods, I think, even if he’d be horrified at my complete bastardization of pretend math, let alone math?
I think the intuition here is basically that of the everything-list’s “white rabbit” problem. If you consider e.g. all programs at most 10^100 bits in length, there will be many more long than short programs that output a given mind. But I think the standard answer is most of those long programs will just be short programs with irrelevant junk bits tacked on?
I basically don’t understand such arguments as applied to real-world cosmology, i.e. computing programs and not discovering them. ’Cuz if we’re talking about cosmology aren’t we assuming that at some point some computation is going to occur? If so, there’s a very short program that outputs a universal dovetailer that computes all programs of arbitrary length, that repeatedly outputs a universal dovetailer for all programs at most 10^5 bits in length, that.… and it’s just not clear to me what generators win out in the end, whether short short-biased or long long-biased, how that depends on choice of language, or generally what the heck is going.
Warning! Almost assuredly blithering nonsense: (Actually, in that scenario aren’t there logical attractors for programs to output 0, 1, 10, 11, … which results in universal distribution/generator constructed from the uniform generator, which then goes on to compute whatever universe we would have seen from an original universal distribution anyway? This self-organization looks suspiciously like getting information from nowhere, but those computations must cost negentropy if they’re not reversible. If they are reversible then how? Reversible by what? Anyway that is information as seen from outside the system which might not be meaningful—information from any point inside the system seems like it might be lost with each irreversible computation? Bleh, speculations.)
(ETA: Actually couldn’t we just run some simulations of this argument or translate it into terms of Hashlife and see what we get? My hypothesis is that as we compute all programs of length x=0, x++ till infinity, the binary outputs of all computations when sorted into identical groups converge on a universal prior distribution, though for small values of x the convergence is swamped by language choice. I have no real reason to suspect this is hypothesis is accurate or even meaningful.)
(ETA 2: Bleh, forgot about the need to renormalize outputs by K complexity (i.e. maximum lossless compression) ’cuz so many programs will output “111111111111...”. Don’t even know if that’s meaningful or doesn’t undermine the entire point. Brain doesn’t like math.)
Warning! Almost assuredly blithering nonsense: Hm, is this more informative if instead we consider programs between 10^100 and 10^120 bits in length? Should it matter all that much how long they are? If we can show convergence upon characteristic output distributions by various reasonably large sets of all programs of bit lengths a to b, a < b, between 0 and infinity, then we can perhaps make some weak claims about “attractive” outputs for programs of arbitrary length. I speculated in my other comment reply to your comment that after maximally compressing all of the outputs we might get some neat distribution (whatever the equivalent of the normal distribution is for enough arbitrary program outputs in a given language after compression), though it’s probably something useless that doesn’t explain anything, like, I’m not sure that compressing the results doesn’t just destroy the entire point of getting the outputs. (Instead maybe we’d run all the outputs as programs repeatedly; side question: if you keep doing this how quickly does the algorithm weed out non-halting programs?) Chaitin would smile upon such methods, I think, even if he’d be horrified at my complete bastardization of pretend math, let alone math?