Hm… I’ve been thinking about it for a while (http://www.gwern.net/mugging) and still don’t have any really satisfactory argument.
Now, it’s obvious that you don’t want to scale any prior probability more or less than the reward because either way involves you in difficulties. But do we have any non-ad hoc reason to specifically scale the size of the reward by 1/n, to regard the complexity as equal to the size?
I think there may be a Kolmogorov complexity argument that size does equal complexity, and welcome anyone capable of formally dealing with it. My intuition goes: Kolmogorov complexity is about lengths of strings in some language which map onto outputs. By the pigeonhole principle, not all outputs have short inputs. But various languages will favor different outputs with the scarce short inputs, much like a picture compressor won’t do so hot on music files. 3^^^^3 has a short representation in the usual mathematical language, but that language is just one of indefinitely many possible mathematical languages; there are languages where 3^^^^3 has a very long representation, one 3^^^^3 long even, or longer still! Just like there are inputs to zip which compress very well and inputs which aren’t so great.
Now, given that our particular language favors 3^^^^3 with such a short encoding rather than many other numbers relatively nearby to 3^^^^3, doesn’t Pascal’s mugging start looking like an issue with our language favoring certain numbers? It’s not necessary to favor 3^^^^3, other languages don’t favor it or actively disfavor it. Those other languages would seem to be as valid and logically true as the current one, just the proofs or programs may be longer and written differently of course.
So, what do you get when you abstract away from the particular outputs favored by a language? When you add together and average the languages with long encodings and the ones with short encodings for 3^^^^3? What’s the length of the average program for 3^^^^3? I suggest it’s 3^^^^3, with every language that gives it a shorter encoding counterbalanced by a language with an exactly longer encoding.
What’s the length of the average program for 3^^^^3? I suggest it’s 3^^^^3, with every language that gives it a shorter encoding counterbalanced by a language with an exactly longer encoding.
For a sufficiently crazy set of languages you could make this true for 3^^^^3, but in general what’s simple in one language is still fairly simple elsewhere. If 3+3 takes b bits to describe in language A it takes b+c bits in language B where c is the length of the shortest interpreter for language B in language A (edit: or less :P).
So, I only have a mystical folk understanding of algorithmic probability theory and thus can’t formally deal with it, but it seems to be a practical problem that you can’t look at all possible languages or average them to get a meaningful ultimate low-level language, and currently it seems that the only way to pick out a set of languages as the relevant ones is to look at your environment, which I think isn’t a formal concept yet, and if it was implementable it seems you’d still end up with a ton of continuously-diverging languages where ‘3^^^3’ had lower Kolmogorov complexity than, say, ’1208567128390567103857.42 times the number of nations officially recognized by the the government of France, squared’. It would also be weird to expect ‘googol minus 17’ to have lower Kolmogorov complexity across basically any set of languages than ‘googol’. If I horribly misinterpreted your argument somewhere along the way I apologize.
Well, it may be a practical problem, but it doesn’t bother me that this ‘averaged’ complexity language is uncomputable—most things to do with KC are uncomputable, after all. The question is what justification can we come up with for a proportional prior.
currently it seems that the only way to pick out a set of languages as the relevant ones is to look at your environment, which I think isn’t a formal concept yet
We can apparently talk sensibly about different languages and different Turing machines. The language for complexity may be a language underneath whatever the real Turing machine is, and corresponding to starting states of the Turing machine—much like dovetailing runs all possible programs. This is probably one of the things that any formalization would have to grapple with.
you’d still end up with a ton of continuously-diverging languages where ‘3^^^3’ had lower Kolmogorov complexity than, say, ’1208567128390567103857.42 times the number of nations officially recognized by the the government of France, squared’.
I don’t follow.
It would also be weird to expect ‘googol minus 17’ to have lower Kolmogorov complexity across basically any set of languages than ‘googol’.
I don’t see any weirdness about any number smaller than googol having a smaller complexity; that’s exactly the property that stops a mugging, the monotonically increasing complexity.
Why would it be weird? I think we have a use-mention problem here. You are confusing A̡͊͠͝ with an fairly complicated procedure which generates the number A̡͊͠͝ using an offset from a random but even larger number you have apparently blessed with the short representation ‘googol’. (Why you would favor this ‘googol’ I have no idea, when A̡͊͠͝ is so much more useful in my idiosyncratic branch of the multiverse.)
Perhaps I should ask a clarifying question. These symbols you’re interested in finding the Kolmogorov complexity of across an infinite set of languages. Are they real numbers? Are they real numbers that would be interpreted from their contexts as representing differential reward (however that is objectively defined)? Are they numbers-in-general? Are they quantitative or qualitative symbols in general? Are they any symbols quantitative or qualitative that can be interpreted from their contexts as representing differential reward (however that is objectively defined)? We may be talking about different things.
Right, but you can convert any of the above into (approximate) finite binary strings by interpreting from their contexts how they would have been represented as such, no? I guess why I’m asking is that I expect a lot of number-like-things to show up a lot across the multiverse, and that some of these numbers are going to be mathematically more interesting than others, just how Graham’s number might be discovered by alien mathematicians but neither we nor aliens care about the number 315167427357825136347, which is basically infinitely smaller. An infinite ensemble is infinite but we expect math to be very similar in large swaths of it at least. So my intuitions are reasonably confident that your proposal would only work if we’re limiting our search to some set of quantities that doesn’t include the very skewing mathematically interesting ones (like ‘infinity’).
I guess why I’m asking is that I expect a lot of number-like-things to show up a lot across the multiverse, and that some of these numbers are going to be mathematically more interesting than others, just how Graham’s number might be discovered by alien mathematicians but neither we nor aliens care about the number 315167427357825136347, which is basically infinitely smaller. An infinite ensemble is infinite but we expect math to be very similar in large swaths of it at least.
What’s the smallest un-interesting number? But isn’t that a rather interesting number...
Graham’s number may be interesting to us and aliens a lot like us, but so what? I doubt it’s interesting over all or even most of, say, a Tegmark level IV multiverse.
So my intuitions are reasonably confident that your proposal would only work if we’re limiting our search to some set of quantities that doesn’t include the very skewing mathematically interesting ones (like ‘infinity’).
Limiting your search is, I think, exactly why our current abstractions are a bad base for a universal prior like what is being discussed in the Mugging.
Hm… I’ve been thinking about it for a while (http://www.gwern.net/mugging) and still don’t have any really satisfactory argument.
Now, it’s obvious that you don’t want to scale any prior probability more or less than the reward because either way involves you in difficulties. But do we have any non-ad hoc reason to specifically scale the size of the reward by 1/n, to regard the complexity as equal to the size?
I think there may be a Kolmogorov complexity argument that size does equal complexity, and welcome anyone capable of formally dealing with it. My intuition goes: Kolmogorov complexity is about lengths of strings in some language which map onto outputs. By the pigeonhole principle, not all outputs have short inputs. But various languages will favor different outputs with the scarce short inputs, much like a picture compressor won’t do so hot on music files. 3^^^^3 has a short representation in the usual mathematical language, but that language is just one of indefinitely many possible mathematical languages; there are languages where 3^^^^3 has a very long representation, one 3^^^^3 long even, or longer still! Just like there are inputs to
zip
which compress very well and inputs which aren’t so great.Now, given that our particular language favors 3^^^^3 with such a short encoding rather than many other numbers relatively nearby to 3^^^^3, doesn’t Pascal’s mugging start looking like an issue with our language favoring certain numbers? It’s not necessary to favor 3^^^^3, other languages don’t favor it or actively disfavor it. Those other languages would seem to be as valid and logically true as the current one, just the proofs or programs may be longer and written differently of course.
So, what do you get when you abstract away from the particular outputs favored by a language? When you add together and average the languages with long encodings and the ones with short encodings for 3^^^^3? What’s the length of the average program for 3^^^^3? I suggest it’s 3^^^^3, with every language that gives it a shorter encoding counterbalanced by a language with an exactly longer encoding.
For a sufficiently crazy set of languages you could make this true for 3^^^^3, but in general what’s simple in one language is still fairly simple elsewhere. If 3+3 takes b bits to describe in language A it takes b+c bits in language B where c is the length of the shortest interpreter for language B in language A (edit: or less :P).
So, I only have a mystical folk understanding of algorithmic probability theory and thus can’t formally deal with it, but it seems to be a practical problem that you can’t look at all possible languages or average them to get a meaningful ultimate low-level language, and currently it seems that the only way to pick out a set of languages as the relevant ones is to look at your environment, which I think isn’t a formal concept yet, and if it was implementable it seems you’d still end up with a ton of continuously-diverging languages where ‘3^^^3’ had lower Kolmogorov complexity than, say, ’1208567128390567103857.42 times the number of nations officially recognized by the the government of France, squared’. It would also be weird to expect ‘googol minus 17’ to have lower Kolmogorov complexity across basically any set of languages than ‘googol’. If I horribly misinterpreted your argument somewhere along the way I apologize.
Well, it may be a practical problem, but it doesn’t bother me that this ‘averaged’ complexity language is uncomputable—most things to do with KC are uncomputable, after all. The question is what justification can we come up with for a proportional prior.
We can apparently talk sensibly about different languages and different Turing machines. The language for complexity may be a language underneath whatever the real Turing machine is, and corresponding to starting states of the Turing machine—much like dovetailing runs all possible programs. This is probably one of the things that any formalization would have to grapple with.
I don’t follow.
I don’t see any weirdness about any number smaller than googol having a smaller complexity; that’s exactly the property that stops a mugging, the monotonically increasing complexity.
Why would it be weird? I think we have a use-mention problem here. You are confusing A̡͊͠͝ with an fairly complicated procedure which generates the number A̡͊͠͝ using an offset from a random but even larger number you have apparently blessed with the short representation ‘googol’. (Why you would favor this ‘googol’ I have no idea, when A̡͊͠͝ is so much more useful in my idiosyncratic branch of the multiverse.)
Perhaps I should ask a clarifying question. These symbols you’re interested in finding the Kolmogorov complexity of across an infinite set of languages. Are they real numbers? Are they real numbers that would be interpreted from their contexts as representing differential reward (however that is objectively defined)? Are they numbers-in-general? Are they quantitative or qualitative symbols in general? Are they any symbols quantitative or qualitative that can be interpreted from their contexts as representing differential reward (however that is objectively defined)? We may be talking about different things.
They would be finite binary strings, I think. Anything else, and I’m not sure how to apply pigeonhole.
Right, but you can convert any of the above into (approximate) finite binary strings by interpreting from their contexts how they would have been represented as such, no? I guess why I’m asking is that I expect a lot of number-like-things to show up a lot across the multiverse, and that some of these numbers are going to be mathematically more interesting than others, just how Graham’s number might be discovered by alien mathematicians but neither we nor aliens care about the number 315167427357825136347, which is basically infinitely smaller. An infinite ensemble is infinite but we expect math to be very similar in large swaths of it at least. So my intuitions are reasonably confident that your proposal would only work if we’re limiting our search to some set of quantities that doesn’t include the very skewing mathematically interesting ones (like ‘infinity’).
What’s the smallest un-interesting number? But isn’t that a rather interesting number...
Graham’s number may be interesting to us and aliens a lot like us, but so what? I doubt it’s interesting over all or even most of, say, a Tegmark level IV multiverse.
Limiting your search is, I think, exactly why our current abstractions are a bad base for a universal prior like what is being discussed in the Mugging.