I’d rather say intelligence is a requirement for really good compression, and thus compression can make for a reasonably measurement of a lower bound of intelligence (but not even a particularly good proxy). And you can imagine an intelligent system made up of an intelligent compressor and a bunch of dumb modules. That’s no particularly good reason to think that the best way to develop an intelligent compressor (or even a viable way) is to scale up a dumb compressor.
Since a random list of words has a higher entropy than a list of grammatically correct sentences, which in turn has a higher entropy than intelligible text for the best possible compression of Wikipedia the compressor would have to understand English and enough of the content to exploit semantic redundancy, so the theoretically ideal algorithm would have to be intelligent. But that doesn’t mean that an algorithm that does better in the Hutter test than another automatically is more intelligent in that way. As far as I understand the current algorithms haven’t even gone all that far in exploiting the redundancy of grammar, in that a text generated to be statistically indistinguishable from a grammatical text for those compression algorithms wouldn’t have to be grammatical at all, and there seems no reason to believe the current methods would scale up to the theoretical ideal. Nor does there seem to be any reason to think that compression is a good niche/framework to build a machine that understands English. Machine translation seems better to me and even there methods that try to exploit grammar seem to be out-competed by methods that don’t bother and rely on statistics alone, so even the impressive progress of e. g. google translation doesn’t seem like strong evidence that we are making good progress on the path to a machine that actually understands language. (evidence certainly, but not strong)
TL;DR
Just because tool A would be much improved by a step ” and here magic happens” doesn’t mean that working on improving tool A is a good way to learn magic.
Compression details are probably not too important here.
Compression is to brains what lift is to wings. In both cases, people could see that there’s an abstract principle at work—without necessarily knowing how best to implement it. In both cases people considered a range of solutions—with varying degrees of bioinspiration.
There are some areas where we scan. Pictures, movies, audio, etc. However, we didn’t scan bird wings, we didn’t scan to make solar power, or submarines, or cars, or memories. Look into this issue a bit, and I think most reasonable people will put machine intelligence into the “not scanned” category. We already have a mountain of machine intelligence in the world. None of it was made by scanning.
Compression details are probably not too important here.
And yet you build the entire case for assigning a greater than 90% confidence on the unproven assertion that compression is the core principle of intelligence—the only argument you make that even addresses the main reason for considering WBE at all.
That’s no particularly good reason to think that the best way to develop an intelligent compressor (or even a viable way) is to scale up a dumb compressor.
Compression is a white bear approach to AGI.
Tolstoy recounts that as a boy, his eldest brother promised him and his siblings that their wishes would come true provided that, among other things, they could stand in a corner and not think about a white bear. The compression approach to AGI seems to be the same sort of enterprise: if we just work on mere, mundane, compression algorithms and not think about AGI, then we’ll achieve it.
Do you mean “avoiding being overwhelmed by the magnitude of the problem as a whole and making steady progress in small steps” or “substituting wishful thinking for thinking about the problem”, or something else?
″Solomonoff’s model of induction rapidly learns to make optimal predictions for any computable sequence, including probabilistic ones. It neatly brings together the philosophical principles of Occam’s razor, Epicurus’ principle of multiple explanations, Bayes theorem and Turing’s model of universal computation into a theoretical sequence predictor with astonishingly powerful properties.″
It is hard to describe the idea that thinking Solomonoff induction bears on machine intelligence as “wishful thinking”. Prediction is useful and important—and this is basically how you do it.
“Indeed the problem of sequence prediction could well be considered solved, if it were not for the fact that Solomonoff’s theoretical model is incomputable.”
and:
“Could there exist elegant computable prediction algorithms that are in some sense universal? Unfortunately this is impossible, as pointed out by Dawid.”
and:
“We then prove that some sequences, however, can only be predicted by very complex predictors. This implies that very general prediction algorithms, in particular those that can learn to predict all sequences up to a given Kolmogorov complex[ity], must themselves be complex. This puts an end to our hope of there being an extremely general and yet relatively simple prediction algorithm. We then use this fact to prove that although very powerful prediction algorithms exist, they cannot be mathematically discovered due to Gödel incompleteness. Given how fundamental prediction is to intelligence, this result implies that beyond a moderate level of complexity the development of powerful artificial intelligence algorithms can only be an experimental science.”
While Solomonoff induction is mathematically interesting, the paper itself seems to reject your assessment of it.
Compression is a key component, yes. See:
http://prize.hutter1.net/
http://marknelson.us/2006/08/24/the-hutter-prize/
We don’t know how we should go about making good compression algorithms—though progress is gradually being made in that area.
Your last question seems to suggest one way of thinking about how closely related the concepts of stream compression and intelligence are.
I’d rather say intelligence is a requirement for really good compression, and thus compression can make for a reasonably measurement of a lower bound of intelligence (but not even a particularly good proxy). And you can imagine an intelligent system made up of an intelligent compressor and a bunch of dumb modules. That’s no particularly good reason to think that the best way to develop an intelligent compressor (or even a viable way) is to scale up a dumb compressor.
Since a random list of words has a higher entropy than a list of grammatically correct sentences, which in turn has a higher entropy than intelligible text for the best possible compression of Wikipedia the compressor would have to understand English and enough of the content to exploit semantic redundancy, so the theoretically ideal algorithm would have to be intelligent. But that doesn’t mean that an algorithm that does better in the Hutter test than another automatically is more intelligent in that way. As far as I understand the current algorithms haven’t even gone all that far in exploiting the redundancy of grammar, in that a text generated to be statistically indistinguishable from a grammatical text for those compression algorithms wouldn’t have to be grammatical at all, and there seems no reason to believe the current methods would scale up to the theoretical ideal. Nor does there seem to be any reason to think that compression is a good niche/framework to build a machine that understands English. Machine translation seems better to me and even there methods that try to exploit grammar seem to be out-competed by methods that don’t bother and rely on statistics alone, so even the impressive progress of e. g. google translation doesn’t seem like strong evidence that we are making good progress on the path to a machine that actually understands language. (evidence certainly, but not strong)
TL;DR Just because tool A would be much improved by a step ” and here magic happens” doesn’t mean that working on improving tool A is a good way to learn magic.
Compression details are probably not too important here.
Compression is to brains what lift is to wings. In both cases, people could see that there’s an abstract principle at work—without necessarily knowing how best to implement it. In both cases people considered a range of solutions—with varying degrees of bioinspiration.
There are some areas where we scan. Pictures, movies, audio, etc. However, we didn’t scan bird wings, we didn’t scan to make solar power, or submarines, or cars, or memories. Look into this issue a bit, and I think most reasonable people will put machine intelligence into the “not scanned” category. We already have a mountain of machine intelligence in the world. None of it was made by scanning.
And yet you build the entire case for assigning a greater than 90% confidence on the unproven assertion that compression is the core principle of intelligence—the only argument you make that even addresses the main reason for considering WBE at all.
Compression is a white bear approach to AGI.
Tolstoy recounts that as a boy, his eldest brother promised him and his siblings that their wishes would come true provided that, among other things, they could stand in a corner and not think about a white bear. The compression approach to AGI seems to be the same sort of enterprise: if we just work on mere, mundane, compression algorithms and not think about AGI, then we’ll achieve it.
Do you mean “avoiding being overwhelmed by the magnitude of the problem as a whole and making steady progress in small steps” or “substituting wishful thinking for thinking about the problem”, or something else?
Using wishful thinking to avoid the magnitude of the problem.
This is Solomonoff induction:
″Solomonoff’s model of induction rapidly learns to make optimal predictions for any computable sequence, including probabilistic ones. It neatly brings together the philosophical principles of Occam’s razor, Epicurus’ principle of multiple explanations, Bayes theorem and Turing’s model of universal computation into a theoretical sequence predictor with astonishingly powerful properties.″
http://www.vetta.org/documents/IDSIA-12-06-1.pdf
It is hard to describe the idea that thinking Solomonoff induction bears on machine intelligence as “wishful thinking”. Prediction is useful and important—and this is basically how you do it.
But:
“Indeed the problem of sequence prediction could well be considered solved, if it were not for the fact that Solomonoff’s theoretical model is incomputable.”
and:
“Could there exist elegant computable prediction algorithms that are in some sense universal? Unfortunately this is impossible, as pointed out by Dawid.”
and:
“We then prove that some sequences, however, can only be predicted by very complex predictors. This implies that very general prediction algorithms, in particular those that can learn to predict all sequences up to a given Kolmogorov complex[ity], must themselves be complex. This puts an end to our hope of there being an extremely general and yet relatively simple prediction algorithm. We then use this fact to prove that although very powerful prediction algorithms exist, they cannot be mathematically discovered due to Gödel incompleteness. Given how fundamental prediction is to intelligence, this result implies that beyond a moderate level of complexity the development of powerful artificial intelligence algorithms can only be an experimental science.”
While Solomonoff induction is mathematically interesting, the paper itself seems to reject your assessment of it.
Not at all! I have no quarrel whatsoever with any of that (except some minor quibbles about the distinction between “math” and “science”).
I suspect you are not properly weighing the term “elegant” in the second quotation.
The paper is actually arguing that sufficiently comprehensive universal prediction algorithms are necessarily large and complex. Just so.