The problem is that the Solomonoff prior picks out 3^^^3 as much more likely than most of the numbers of the same magnitude because it has much lower Kolmogorov complexity.
I’m not familiar with Kolmogorov complexity, but isn’t the aparent simplicity of 3^^^3 just an artifact of what notation we happen to have invented? I mean, “^^^” is not really a basic operation in arithmetic. We have a nice compact way of describing what steps are needed to get from a number we intuitively grok, 3, to 3^^^3, but I’m not sure it’s safe to say that makes it simple in any significant way. For one thing, what would make 3 a simple number in the first place?
In the nicest possible way, shouldn’t you have stopped right there? Shouldn’t the appearance of this unfamiliar and formidable-looking word have told you that I wasn’t appealing to some intuitive notion of complexity, but to a particular formalisation that you would need to be familiar with to challenge? If instead of commenting you’d Googled that term, you would have found the Wikipedia article that answered this and your next question.
You can as a rough estimate of the complexity of a number take the amount of lines of the shortest program that would compute the number from basic operations.
More formally, substitute lines of a program with states of a Turing Machine.
But what numbers are you allowed to start with on the computation? Why can’t I say that, for example, 12,345,346,437,682,315,436 is one of the numbers I can do computation from (as a starting point), and thus it has extremely small complexity?
You could say this—doing so would be like describing your own language in which things involving 12,345,346,437,682,315,436 can be expressed concisely.
So Kolmogorov complexity is somewhat language-dependent. However, given two languages in which you can describe numbers, you can compute a constant such that the complexity of any number is off by at most that constant between the two languages. (The constant is more or less the complexity of describing one language in the other). So things aren’t actually too bad.
But if we’re just talking about Turing machines, we presumably express numbers in binary, in which case writing “3” can be done very easily, and all you need to do to specify 3^^^3 is to make a Turing machine computing ^^^.
However, given two languages in which you can describe numbers, you can compute a constant such that the complexity of any number is off by at most that constant between the two languages.
But can’t this constant itself be arbitrarily large when talking about arbitrary numbers? (Of course, for any specific number, it is limited in size.)
The constant depends on the two languages, but not on the number. As army1987 points out, if you pick the number first, and then make up languages, then the difference can be arbitrarily large. (You could go in the other direction as well: if your language specifies that no number less than 3^^^3 can be entered as a constant, then it would probably take approximately log(3^^^3) bits to specify even small numbers like 1 or 2.)
But if you pick the languages first, then you can compute a constant based on the languages, such that for all numbers, the optimal description lengths in the two languages differ by at most a constant.
The context this in which this comes up here generally requires something like “there’s a way to compare the complexity of numbers which always produces the same results independent of language, except in a finite set of cases. Since that set is finite and my argument doesn’t depend on any specific number, I can always base my argument on a case that’s not in that set.”
If that’s how you’re using it, then you don’t get to pick the languages first.
You do get to pick the languages first because there is a large but finite (say no more than 10^6) set of reasonable languages-modulo-trivial-details that could form the basis for such a measurement.
The problem is that the Solomonoff prior picks out 3^^^3 as much more likely than most of the numbers of the same magnitude because it has much lower Kolmogorov complexity.
I’m not familiar with Kolmogorov complexity, but isn’t the aparent simplicity of 3^^^3 just an artifact of what notation we happen to have invented? I mean, “^^^” is not really a basic operation in arithmetic. We have a nice compact way of describing what steps are needed to get from a number we intuitively grok, 3, to 3^^^3, but I’m not sure it’s safe to say that makes it simple in any significant way. For one thing, what would make 3 a simple number in the first place?
In the nicest possible way, shouldn’t you have stopped right there? Shouldn’t the appearance of this unfamiliar and formidable-looking word have told you that I wasn’t appealing to some intuitive notion of complexity, but to a particular formalisation that you would need to be familiar with to challenge? If instead of commenting you’d Googled that term, you would have found the Wikipedia article that answered this and your next question.
You can as a rough estimate of the complexity of a number take the amount of lines of the shortest program that would compute the number from basic operations. More formally, substitute lines of a program with states of a Turing Machine.
But what numbers are you allowed to start with on the computation? Why can’t I say that, for example, 12,345,346,437,682,315,436 is one of the numbers I can do computation from (as a starting point), and thus it has extremely small complexity?
You could say this—doing so would be like describing your own language in which things involving 12,345,346,437,682,315,436 can be expressed concisely.
So Kolmogorov complexity is somewhat language-dependent. However, given two languages in which you can describe numbers, you can compute a constant such that the complexity of any number is off by at most that constant between the two languages. (The constant is more or less the complexity of describing one language in the other). So things aren’t actually too bad.
But if we’re just talking about Turing machines, we presumably express numbers in binary, in which case writing “3” can be done very easily, and all you need to do to specify 3^^^3 is to make a Turing machine computing ^^^.
But can’t this constant itself be arbitrarily large when talking about arbitrary numbers? (Of course, for any specific number, it is limited in size.)
Well… Given any number N, you can in principle invent a programming language where the program
do_it
outputs N.The constant depends on the two languages, but not on the number. As army1987 points out, if you pick the number first, and then make up languages, then the difference can be arbitrarily large. (You could go in the other direction as well: if your language specifies that no number less than 3^^^3 can be entered as a constant, then it would probably take approximately log(3^^^3) bits to specify even small numbers like 1 or 2.)
But if you pick the languages first, then you can compute a constant based on the languages, such that for all numbers, the optimal description lengths in the two languages differ by at most a constant.
The context this in which this comes up here generally requires something like “there’s a way to compare the complexity of numbers which always produces the same results independent of language, except in a finite set of cases. Since that set is finite and my argument doesn’t depend on any specific number, I can always base my argument on a case that’s not in that set.”
If that’s how you’re using it, then you don’t get to pick the languages first.
You do get to pick the languages first because there is a large but finite (say no more than 10^6) set of reasonable languages-modulo-trivial-details that could form the basis for such a measurement.