Kolomogorov complexity isn’t an option, and approximations of Kolmogorov complexity don’t have that uniqueness property.
Even if we did have that uniqueness property, it’s still not all that helpful if you’re trying to decide among a finite number of finite-complexity hypotheses. Choice of description language is still sufficient to completely determine your choice of which hypothesis is simplest.
As far as I can tell, a MDL-ish complexity measurement that is “unique up to a constant” acts like a prior—in the limit of infinite evidence, the initial bias will wash out. Kelly is fond of analogizing this “justification” of Occam to a flat tire—it helps you get home because you can eventually fix it.
Kolomogorov complexity isn’t an option, and approximations of Kolmogorov complexity don’t have that uniqueness property.
Right, but we can find a series of increasingly tight upper bounds to the KC, and the bounds are language-independent in the same sense that the KC itself is.
Then for any sufficiently large dataset, we can say that the probability of the data is given by the model that produces the tightest upper bound for the KC. This is nicely rigorous—if you think my model is unjustified and doesn’t correspond to the “real” probabilities, all you have to do is produce a new model that achieves a tighter KC-bound.
Can you help me with the question: “How do you decide which of a finite number of hypotheses, each of which is of finite complexity, is simplest, using approximations of Kolmogorov complexity?”
Also, my understanding is that people normally approximate Kolmogorov complexity by using non-Turing-complete models of computation, such as the finite-state automata that Hutter and his co-authors recently used to play Pac-Man: http://il.youtube.com/watch?v=RhQTWidQQ8U
If the model of computation is not Turing complete, then complexity measurements derived from it do not have the “uniqueness” property that we’re talking about.
A interposed question: Can you or someone point me to an introduction or starting point regarding the overall topic related to Kolomogorov complexity (so I am able to follow, not contribute)? Or is an explanation already contained in the sequences? This idea seems to be discussed quite often on LW. Thank you.
I don’t think it’s been thoroughly explained in the sequences, but why not start with the Wikipedia page? Or this post.
On second thought, this is actually a very simple idea that can be explained in a comment. Let me try:
The Kolmogorov complexity of a finite string of bytes is the size (in bytes) of the shortest Python program that outputs that string and then halts. Or you could use any other language in place of Python and get a different value of complexity—but it cannot differ by more than a constant as the string grows, because you could always implement a constant-size Python interpreter in the other language to “convert” one description into another. So complexity is not unique, but all definitions of complexity sorta “agree” as we go to infinity.
For example, if a string consists of “a” repeated a million times, its Kolmogorov complexity is quite small because you can write a very short program that outputs a million a’s and then halts. But this is not the case for all possible strings. Here’s a simple way to see why: there can’t be more Python programs of length N than there are different byte strings of length N, because any program is itself a byte string. Therefore, you cannot encode all strings of length N using programs of length N/2 or something, because there’s not enough of them. Actually this shows that most strings of length N must have complexity close to N. (Obviously it can’t be much larger than N, because any string of length N can be output by a program of length N+c that “hardcodes” that specific string.)
Another important observation. Imagine you have a huge, random-looking string of bytes. How can you calculate its Kolmogorov complexity? Turns out that you can’t, because there’s no easier way to do that than, essentially, simulating all programs up to length N and seeing what they output. But this is impossible because any given program can (say) hang idly for a million years and only then start outputting symbols, and you have no way of learning that reliably from the program text (this is known as the “halting problem”). So K-complexity is largely a theoretical notion, I have no idea why people love it so.
So K-complexity is largely a theoretical notion, I have no idea why people love it so.
Imagine some Oracle told you that obtaining the ultimate theory of physics—the one that unifies relativity and quantum gravity, and all that—was impossible. Then following your logic you might conclude that it was pointless to study physics, since one can never obtain the perfect theory. But that would be wrong; it would still be highly worthwhile to attempt to obtain better and better physical theories.
Kolmogorov-Chaitin information theory is focused on quantification—that is, on being able to measure the quantity of information in a string of characters. It doesn’t care what the string means. Information theory cares about how much information is in a string; it doesn’t care what the string means. In fact, you can say something much stronger about information theory: if you consider meaning at all, then you’re not doing information theory. A truly abstract string could mean many different things: information theory’s measurement of its information content must include everything that could be used relative to any possible system of meaning to determine the information content of a string.
Haven’t thought too hard about this question. In my mind complexity is filed away as a “mystery”, on the same shelf as frequentism vs Bayesianism, decision theory and other things. I know the state of the art and am convinced that it’s unsatisfactory, but how to fix it is unclear. You could carve out a nice research area for yourself if you took any of these questions and ran with it :-)
I have a vague idea that one shouldn’t look for a single correct notion of complexity, but instead try to find some reasonable properties that any measure should have, and study them all at once. For instance, if I propose a measure that turns out to be equivalent to square-root of K-complexity, who’s to say it’s better or worse? More seriously, one could imagine complexity measures like “the time it would take to explain in English”, “the time it would take to explain in Japanese”, “the time it would take a smart person to explain”, “the time it would take a stupid person to explain”...
But when I try to think about “properties a measure should have” all I can come up with is a kind of monotonicity: a complexity measure is a real-valued function on strings whose value on a given string is larger than the value on an (initial?) substring. That is not even true of K-complexity. (E.g. if N is an integer with high complexity, but less than 10 to the one-hundred, then a string of N repeated zeroes will have higher complexity than a string of 10 to the one-hundred zeroes.)
This suggests a new focus—drugs to treat internet addiction and promote procrastination resistance. The experimental procedure seems obvious. The double-blind, placebo controlled T4ET (Tv Tropes—Time To Exit Test).
This concept sounds like one that could be a useful addition to your ‘could’, ‘probability’ mini-series. Putting things into ‘python’ language, as you have done, makes the concepts much simpler to apply.
That mini-series is about stuff that I make up myself. K-complexity is a well known notion, maybe you could ask JoshuaZ—he was planning to write some introductory math posts.
That mini-series is about stuff that I make up myself. K-complexity is a well known notion,
Ok. I actually considered the ‘could’ post a well known notion just expressed in an intuitive format.
maybe you could ask JoshuaZ—he was planning to write some introductory math posts.
Not a bad idea. Failing that I could always write one myself. But for now I’ll just hope your comment above comes up in the custom search when I’m looking for a link target. You touched on most of the important details (albeit briefly) so that was about 2⁄3 post ready as is.
Thanks for that link, I wasn’t aware of Monte Carlo AIXI. It’s amusing how little it took to make Nesov and Roko frightened.
It’s not the specific result that’s frightening, but the overall research direction, where each result contributes exclusively to destroying the world. We’ve got a small tiny step closer to disaster, job well done.
Kolomogorov complexity isn’t an option, and approximations of Kolmogorov complexity don’t have that uniqueness property.
Even if we did have that uniqueness property, it’s still not all that helpful if you’re trying to decide among a finite number of finite-complexity hypotheses. Choice of description language is still sufficient to completely determine your choice of which hypothesis is simplest.
As far as I can tell, a MDL-ish complexity measurement that is “unique up to a constant” acts like a prior—in the limit of infinite evidence, the initial bias will wash out. Kelly is fond of analogizing this “justification” of Occam to a flat tire—it helps you get home because you can eventually fix it.
Right, but we can find a series of increasingly tight upper bounds to the KC, and the bounds are language-independent in the same sense that the KC itself is.
Then for any sufficiently large dataset, we can say that the probability of the data is given by the model that produces the tightest upper bound for the KC. This is nicely rigorous—if you think my model is unjustified and doesn’t correspond to the “real” probabilities, all you have to do is produce a new model that achieves a tighter KC-bound.
Can you help me with the question: “How do you decide which of a finite number of hypotheses, each of which is of finite complexity, is simplest, using approximations of Kolmogorov complexity?”
Also, my understanding is that people normally approximate Kolmogorov complexity by using non-Turing-complete models of computation, such as the finite-state automata that Hutter and his co-authors recently used to play Pac-Man: http://il.youtube.com/watch?v=RhQTWidQQ8U
If the model of computation is not Turing complete, then complexity measurements derived from it do not have the “uniqueness” property that we’re talking about.
Thanks for that link, I wasn’t aware of Monte Carlo AIXI. It’s amusing how little it took to make Nesov and Roko frightened.
A interposed question: Can you or someone point me to an introduction or starting point regarding the overall topic related to Kolomogorov complexity (so I am able to follow, not contribute)? Or is an explanation already contained in the sequences? This idea seems to be discussed quite often on LW. Thank you.
I don’t think it’s been thoroughly explained in the sequences, but why not start with the Wikipedia page? Or this post.
On second thought, this is actually a very simple idea that can be explained in a comment. Let me try:
The Kolmogorov complexity of a finite string of bytes is the size (in bytes) of the shortest Python program that outputs that string and then halts. Or you could use any other language in place of Python and get a different value of complexity—but it cannot differ by more than a constant as the string grows, because you could always implement a constant-size Python interpreter in the other language to “convert” one description into another. So complexity is not unique, but all definitions of complexity sorta “agree” as we go to infinity.
For example, if a string consists of “a” repeated a million times, its Kolmogorov complexity is quite small because you can write a very short program that outputs a million a’s and then halts. But this is not the case for all possible strings. Here’s a simple way to see why: there can’t be more Python programs of length N than there are different byte strings of length N, because any program is itself a byte string. Therefore, you cannot encode all strings of length N using programs of length N/2 or something, because there’s not enough of them. Actually this shows that most strings of length N must have complexity close to N. (Obviously it can’t be much larger than N, because any string of length N can be output by a program of length N+c that “hardcodes” that specific string.)
Another important observation. Imagine you have a huge, random-looking string of bytes. How can you calculate its Kolmogorov complexity? Turns out that you can’t, because there’s no easier way to do that than, essentially, simulating all programs up to length N and seeing what they output. But this is impossible because any given program can (say) hang idly for a million years and only then start outputting symbols, and you have no way of learning that reliably from the program text (this is known as the “halting problem”). So K-complexity is largely a theoretical notion, I have no idea why people love it so.
Imagine some Oracle told you that obtaining the ultimate theory of physics—the one that unifies relativity and quantum gravity, and all that—was impossible. Then following your logic you might conclude that it was pointless to study physics, since one can never obtain the perfect theory. But that would be wrong; it would still be highly worthwhile to attempt to obtain better and better physical theories.
Thanks, great! I think this might be another introductory explanation.
Here is more: Information vs. Meaning
Which notion do you prefer?
Haven’t thought too hard about this question. In my mind complexity is filed away as a “mystery”, on the same shelf as frequentism vs Bayesianism, decision theory and other things. I know the state of the art and am convinced that it’s unsatisfactory, but how to fix it is unclear. You could carve out a nice research area for yourself if you took any of these questions and ran with it :-)
I have a vague idea that one shouldn’t look for a single correct notion of complexity, but instead try to find some reasonable properties that any measure should have, and study them all at once. For instance, if I propose a measure that turns out to be equivalent to square-root of K-complexity, who’s to say it’s better or worse? More seriously, one could imagine complexity measures like “the time it would take to explain in English”, “the time it would take to explain in Japanese”, “the time it would take a smart person to explain”, “the time it would take a stupid person to explain”...
But when I try to think about “properties a measure should have” all I can come up with is a kind of monotonicity: a complexity measure is a real-valued function on strings whose value on a given string is larger than the value on an (initial?) substring. That is not even true of K-complexity. (E.g. if N is an integer with high complexity, but less than 10 to the one-hundred, then a string of N repeated zeroes will have higher complexity than a string of 10 to the one-hundred zeroes.)
So many topics, so little time! :)
It’s amazing how much a person can do if some topic manages to interest them more than the Internet, even for a little while.
A timely reminder. I’d better go back to obsessing about nootropics and see if I cannot amaze myself somewhat.
Damn Lesswrong and its “Recent Comments:” and its orange envelope icon.
When you reach a result, be sure to post it and make the Internet a little more alluring for all of us :-)
This suggests a new focus—drugs to treat internet addiction and promote procrastination resistance. The experimental procedure seems obvious. The double-blind, placebo controlled T4ET (Tv Tropes—Time To Exit Test).
This concept sounds like one that could be a useful addition to your ‘could’, ‘probability’ mini-series. Putting things into ‘python’ language, as you have done, makes the concepts much simpler to apply.
That mini-series is about stuff that I make up myself. K-complexity is a well known notion, maybe you could ask JoshuaZ—he was planning to write some introductory math posts.
Ok. I actually considered the ‘could’ post a well known notion just expressed in an intuitive format.
Not a bad idea. Failing that I could always write one myself. But for now I’ll just hope your comment above comes up in the custom search when I’m looking for a link target. You touched on most of the important details (albeit briefly) so that was about 2⁄3 post ready as is.
Sorry, I thought it was old news in LW circles. Here’s the actual paper:
http://jveness.info/publications/veness_rl_via_aixi_approx.pdf
It’s not the specific result that’s frightening, but the overall research direction, where each result contributes exclusively to destroying the world. We’ve got a small tiny step closer to disaster, job well done.