1.Set up a camera next to a highway, and record the stream of passing cars. To compress the resulting data, you will need to develop a computational understanding of the visual appearance of automobiles. You will need theories of hubcaps, windshields, license plates, car categories, and so on.
How does that work? The data would be rediculously complex. If you want to sacrifice quality, you could just convert everything to 3d models and compress that, but that would have alot of error compared to an original video. If you were to compress just the video, knowledge about cars wouldn’t do any good because there are far to many outliers, the pixel values. Now there might be an complex way to do it, assume the general shapes of objects and limit the possible variation of pixels in those areas, but even then there are to many exceptions to be useful. And how do you compress faces? You might be able to create a method that can identify a face or a specific face from something else, but then its not compression, its object recognition. The same with your bird example which might even be more complex then the car example. And if you decide to not overfit, this sentence could be compressed by “its all made of letters and the most common letter is e, so therefore it is all E’s.”
As the OP says “the key difference is the emphasis on large-scale compression”.
Something with 3D models eventually compresses a lot better than recording video, over a long period of time, for the same reason video games are made with 3D models rather than by recording every possible game scenario as a precomputed animation.
But there is always going to be error. Games don’t look realistic because of this method. On the video, you could have a spot on the window where it instantly turns blue for no reason. What I’m trying to say is theres to much static, to many outliers. If you were to take a thousand pictures of someone, no two pictures would be the same, even though they appear to be.
Along with what rwallace said, it might be a good idea to point out that the total message length actually breaks down into three parts:
1) Specification of the encoding: in info-theoretical terms, this maps more common, long strings (observations) to shorter symbols. It plays the role of the theory, in that it implicitly says what observations are most likely. This part tells you how to “decompress” parts 2 and 3 into their original form.
2) The compressed string: this lists all the observed data, using the shortcuts given in 1). Common occurrences are easily described by short strings.
3) The “exceptions” to the theory—the observations that aren’t easily compressed into 2). The longer strings are reserved for these, since they occur the least.
So yes, even if you can’t reach perfection, any strong regularity found in the data will allow you to compress it, even if it’s a rule with part-3 exceptions.
Daniel’s Reusability hypothesis is correct: any insight you gain into a particular domain can be equivalently expressed as a way to compress descriptions of that domain.
However, let me again register my annoyance with this series, not on grounds that it’s wrong (so far), but that it’s just an exposition of the well-accepted minimum message length formalism. It may not be accepted by scientists, but it’s accepted here, and sciences do implicitly use it when they can’t do controlled experiments (which are not strictly necessary for science anyway), such as in astronomy and geology.
First of all, compression doesn’t have to be limited to just commonly occuring sets. For example, if I create a fractal, you don’t have to find out commonly used colors or pixels that occur near each other, but just create a program which generates the exact same fractal. When you do this you add another element to the system—time. How much time do you spend trying to compress something, or vice versa, decompressing it? And no compression system can garuntee that every observed instance can be compressed to less then its original size. Infact, there are always going to be cases where compressing it actually increases the size of the file, possibly significantly, but at least just a few bits.
First of all, compression doesn’t have to be limited to just commonly occuring sets. For example, if I create a fractal, you don’t have to find out commonly used colors or pixels that occur near each other, but just create a program which generates the exact same fractal.
That kind of regularity can still be explained in the language of commonly occurring sets: For a fractal, you reserve a symbol for the overall structure, which is common (in that something like it occurs frequently), and you have a symbol to instantiate it, given a particular position, to what scale. Time is handled similarly.
Compression amounts to saying: “This is the structure of the data. Now, here’s the remaining data to full in any uncertainty not resolved by its structure.” The structure tells you what long symbols you can substitute with short symbols.
And no compression system can garuntee that every observed instance can be compressed to less then its original size.
True, compression only attempts to guarantee that, for the function generating the data, you save, on average, by using a certain compression scheme. If the generating function has no regularity, then you won’t be able to compress, even on average.
That kind of regularity can still be explained in the language of commonly occurring sets: For a fractal, you reserve a symbol for the overall structure, which is common (in that something like it occurs frequently), and you have a symbol to instantiate it, given a particular position, to what scale. Time is handled similarly.
Yes but if you have a symbol for every possible fractal, it defeats the purpose. The point was to use time as a memory resource. This also works in reverse with caching and other methods, you can use memory as if it were time. Both, however, only work up to a point.
Yes, but if your compressed data contains both a 3-D model and a stream of outlier pixels (to be applied as delta to the result of rendering the model), this is still going to be smaller than if you didn’t have a 3-D model and just supplied the video as a pixel stream.
Mind you, if what you want is machine vision, I think in practice you will do better to work on machine vision directly than on video compression. But the compression approach is valid as a matter of mathematical principle.
Right—the delta correction has lower entropy than the raw pixel stream. Of course, you have to use a good 3-D model!
Mind you, if what you want is machine vision, I think in practice you will do better to work on machine vision directly than on video compression.
Are you familiar with the field of machine vision? If not, two quick points: 1) the evaluation metrics it employs are truly terrible. 2) Almost all computer vision tasks can be reformulated as specialized image compression tricks (e.g. the stereo correspondence problem can be reformulated as a way of compressing the second image in a stereo pair by predicting the pixels using the first image and a disparity function).
I’ll admit I’m not particularly familiar with machine vision. I was extrapolating from other areas such as natural language processing, where trying to formulate problems in terms of compression isn’t particularly helpful, but if an expert on machine vision says it is helpful in that field, fair enough.
natural language processing, where trying to formulate problems in terms of compression isn’t particularly helpful
Why do you think that? Statistical natural language processing has been very successful, both for recognition and translation of text, and for speech recognition.
Statistical natural language processing yes, but that’s not the same thing as text compression (again, there is a mathematical sense in which one could consider them related in principle, but doing text compression and doing natural language processing were still separate activities last I heard).
How does that work? The data would be rediculously complex. If you want to sacrifice quality, you could just convert everything to 3d models and compress that, but that would have alot of error compared to an original video. If you were to compress just the video, knowledge about cars wouldn’t do any good because there are far to many outliers, the pixel values. Now there might be an complex way to do it, assume the general shapes of objects and limit the possible variation of pixels in those areas, but even then there are to many exceptions to be useful. And how do you compress faces? You might be able to create a method that can identify a face or a specific face from something else, but then its not compression, its object recognition. The same with your bird example which might even be more complex then the car example. And if you decide to not overfit, this sentence could be compressed by “its all made of letters and the most common letter is e, so therefore it is all E’s.”
As the OP says “the key difference is the emphasis on large-scale compression”.
Something with 3D models eventually compresses a lot better than recording video, over a long period of time, for the same reason video games are made with 3D models rather than by recording every possible game scenario as a precomputed animation.
But there is always going to be error. Games don’t look realistic because of this method. On the video, you could have a spot on the window where it instantly turns blue for no reason. What I’m trying to say is theres to much static, to many outliers. If you were to take a thousand pictures of someone, no two pictures would be the same, even though they appear to be.
Along with what rwallace said, it might be a good idea to point out that the total message length actually breaks down into three parts:
1) Specification of the encoding: in info-theoretical terms, this maps more common, long strings (observations) to shorter symbols. It plays the role of the theory, in that it implicitly says what observations are most likely. This part tells you how to “decompress” parts 2 and 3 into their original form.
2) The compressed string: this lists all the observed data, using the shortcuts given in 1). Common occurrences are easily described by short strings.
3) The “exceptions” to the theory—the observations that aren’t easily compressed into 2). The longer strings are reserved for these, since they occur the least.
So yes, even if you can’t reach perfection, any strong regularity found in the data will allow you to compress it, even if it’s a rule with part-3 exceptions.
Daniel’s Reusability hypothesis is correct: any insight you gain into a particular domain can be equivalently expressed as a way to compress descriptions of that domain.
However, let me again register my annoyance with this series, not on grounds that it’s wrong (so far), but that it’s just an exposition of the well-accepted minimum message length formalism. It may not be accepted by scientists, but it’s accepted here, and sciences do implicitly use it when they can’t do controlled experiments (which are not strictly necessary for science anyway), such as in astronomy and geology.
First of all, compression doesn’t have to be limited to just commonly occuring sets. For example, if I create a fractal, you don’t have to find out commonly used colors or pixels that occur near each other, but just create a program which generates the exact same fractal. When you do this you add another element to the system—time. How much time do you spend trying to compress something, or vice versa, decompressing it? And no compression system can garuntee that every observed instance can be compressed to less then its original size. Infact, there are always going to be cases where compressing it actually increases the size of the file, possibly significantly, but at least just a few bits.
That kind of regularity can still be explained in the language of commonly occurring sets: For a fractal, you reserve a symbol for the overall structure, which is common (in that something like it occurs frequently), and you have a symbol to instantiate it, given a particular position, to what scale. Time is handled similarly.
Compression amounts to saying: “This is the structure of the data. Now, here’s the remaining data to full in any uncertainty not resolved by its structure.” The structure tells you what long symbols you can substitute with short symbols.
True, compression only attempts to guarantee that, for the function generating the data, you save, on average, by using a certain compression scheme. If the generating function has no regularity, then you won’t be able to compress, even on average.
Yes but if you have a symbol for every possible fractal, it defeats the purpose. The point was to use time as a memory resource. This also works in reverse with caching and other methods, you can use memory as if it were time. Both, however, only work up to a point.
Yes, but if your compressed data contains both a 3-D model and a stream of outlier pixels (to be applied as delta to the result of rendering the model), this is still going to be smaller than if you didn’t have a 3-D model and just supplied the video as a pixel stream.
Mind you, if what you want is machine vision, I think in practice you will do better to work on machine vision directly than on video compression. But the compression approach is valid as a matter of mathematical principle.
Right—the delta correction has lower entropy than the raw pixel stream. Of course, you have to use a good 3-D model!
Are you familiar with the field of machine vision? If not, two quick points: 1) the evaluation metrics it employs are truly terrible. 2) Almost all computer vision tasks can be reformulated as specialized image compression tricks (e.g. the stereo correspondence problem can be reformulated as a way of compressing the second image in a stereo pair by predicting the pixels using the first image and a disparity function).
I’ll admit I’m not particularly familiar with machine vision. I was extrapolating from other areas such as natural language processing, where trying to formulate problems in terms of compression isn’t particularly helpful, but if an expert on machine vision says it is helpful in that field, fair enough.
Why do you think that? Statistical natural language processing has been very successful, both for recognition and translation of text, and for speech recognition.
Statistical natural language processing yes, but that’s not the same thing as text compression (again, there is a mathematical sense in which one could consider them related in principle, but doing text compression and doing natural language processing were still separate activities last I heard).