The simplicity prior is that you should assign a prior probability 2^-L to the description of length L. This sort of makes intuitive sense, since it’s what you’d get if you generated the description through a series of coinflips...
… except there are 2^L descriptions of length L, so the total prior probability you’re assigning is sum(2^L * 2^-L) = sum(1) = unnormalisable.
You can kind of recover this by noticing that not all bitstrings correspond to an actual description, and for some encodings their density is low enough that it can be normalised (I think the threshold is that less than 1/L descriptions of length L are “valid”)...
...but if that’s the case, you’re being fairly information inefficient because you could compress descriptions further, and why are you judging simplicity using such a bad encoding, and why 2^-L in that case if it doesn’t really correspond to complexity properly any more? And other questions in this cluster.
I am confused (and maybe too hung up on something idiosyncratic to an intuitive description I heard).
Was this meant to be a reply to my bit about the Solmonoff prior?
If so, in the algorithmic information literature, they usually fix the unnormalizability stuff by talking about Prefix Turing machines. Which corresponds to only allowing TM descriptions that correspond to a valid Prefix Code.
But it is a good point that for steeper discounting rates, you don’t need to do that.
It was inspired by yours—when I read your post I remembered that there was this thing about Solomonoff induction that I was still confused about—though I wasn’t directly trying to answer your question so I made it its own thread.
The simplicity prior is that you should assign a prior probability 2^-L to the description of length L. This sort of makes intuitive sense, since it’s what you’d get if you generated the description through a series of coinflips...
… except there are 2^L descriptions of length L, so the total prior probability you’re assigning is sum(2^L * 2^-L) = sum(1) = unnormalisable.
You can kind of recover this by noticing that not all bitstrings correspond to an actual description, and for some encodings their density is low enough that it can be normalised (I think the threshold is that less than 1/L descriptions of length L are “valid”)...
...but if that’s the case, you’re being fairly information inefficient because you could compress descriptions further, and why are you judging simplicity using such a bad encoding, and why 2^-L in that case if it doesn’t really correspond to complexity properly any more? And other questions in this cluster.
I am confused (and maybe too hung up on something idiosyncratic to an intuitive description I heard).
Was this meant to be a reply to my bit about the Solmonoff prior?
If so, in the algorithmic information literature, they usually fix the unnormalizability stuff by talking about Prefix Turing machines. Which corresponds to only allowing TM descriptions that correspond to a valid Prefix Code.
But it is a good point that for steeper discounting rates, you don’t need to do that.
It was inspired by yours—when I read your post I remembered that there was this thing about Solomonoff induction that I was still confused about—though I wasn’t directly trying to answer your question so I made it its own thread.