I can look at a picture of a face and know that it’s a face. If you switched a bunch of pixels around, or blurred parts of the image a little bit, I’d still know it was a face. To me it seems relevant that it’s a picture of a face, but not as relevant what all the pixels are. Does AI need to be able to do lossless compression to have understanding?
I suppose the response might be that if you have a bunch of pictures of faces, and know that they’re faces, then you ought to be able to get some mileage out of that. And even if you’re trying to remember all the pixels, there’s less information to store if you’re just diff-ing from what your face-understanding algorithm predicts is most likely. Is that it?
Well, lossless compression implies understanding. Lossy compression may or may not imply understanding.
Also, usually you can get a lossy compression algorithm from a lossless one. In image compression, the lossless method would typically be to send a scene description plus a low-entropy correction image; you can easily save bits by just skipping the correction image.
I emphasize lossless compression because it enables strong comparisons between competing methods.
Not really, at least not until you start to approach Kolmogorov complexity.
In a natural image, most of the information is low level detail that has little or no human-relevant meaning: stuff like textures, background, lighting properties, minuscule shape details, lens artifacts, lossy compression artifacts (if the image was crawled from the Internet it was probably a JPEG originally), and so on. Lots of this detail is highly redundant and/or can be well modeled by priors, therefore a lossless compression algorithm could be very good at finding an efficient encoding of it.
A typical image used in machine learning contests is 256 x 256 x 3 x 8 =~ 1.57 million bits. How many bits of meaningful information (*) could it possibly contain? 10? 100? 1000? Whatever the number is, the amount of non-meaningful information certainly dominates, therefore an efficient lossless compression algorithm could obtain an extremely good compression ratio without compressing and thus understanding any amount of meaningful information.
(* consider meaningful information of an image as the number of yes-or-no questions about the image that a human could be normally interested in and would be able to answer by looking at the image, where for each question the probability of the answer being true is approximately 50% over the data set, and the set of question is designed in a way that allows a human to know as much as possible by asking the least number of questions, e.g. something like a 20 questions game.)
I agree with your general point that working on lossless compression requires the researcher to pay attention to details that most people would consider meaningless or irrelevant. In my own text compression work, I have to pay a lot of attention to things like capitalization, comma placement, the difference between Unicode quote characters, etc etc. However, I have three responses to this as a critique of the research program:
The first response is to say that nothing is truly irrelevant. Or, equivalently, the vision system should not attempt to make the relevance distinction. Details that are irrelevant in everyday tasks might suddenly become very relevant in a crime scene investigation (where did this shadow at the edge of the image come from...?). Also, even if a detail is irrelevant at the top level, it might be relevant in the interpretation process; certainly shadowing is very important in the human visual system.
The second response is that while it is difficult and time-consuming to worry about details, this is a small price to pay for the overall goal of objectivity and methodological rigor. Human science has always required a large amount of tedious lab work and unglamorous experimental work.
The third response is to say that even if some phenomenon is considered irrelevant by “end users”, scientists are interested in understanding reality for its own sake, not for the sake of applications. So pure vision scientists should be very interested in, say, categorizing textures, modeling shadows and lighting, and lens artifacts (Actually, in my interactions with computer graphics people, I have found this exact tendency).
By your definition of meaningful information, it’s not actually clear that a strong lossless compressor wouldn’t discover and encode that meaningful information.
For example the presence of a face in an image is presumably meaningful information. From a compression point of view, the presence of a face and it’s approximate pose is also information that has a very large impact on lower level feature coding, in that spending say 100 bits to represent the face and it’s pose could save 10x as many bits in the lowest levels. Some purely unsupervised learning systems—such as sparse coding for example or RBMs—do tend to find high level features that correspond to objects (meaningful information).
Of course that does not imply that training using UL compression criteria is the best way to recognize any particular features/objects.
By your definition of meaningful information, it’s not actually clear that a strong lossless compressor wouldn’t discover and encode that meaningful information.
It could, but also it could not. My point is that compression ratio (that is, average log-likelihood of the data under the model) is not a good proxy for “understanding” since it can be optimized to a very large extent without modeling “meaningful” information.
Yes, good compression can be achieved without deep understanding. But a compressor with deep understanding will ultimately achieve better compression. For example, you can get good text compression results with a simple bigram or trigram model, but eventually a sophisticated grammar-based model will outperform the Ngram approach.
Is it important that it be lossless compression?
I can look at a picture of a face and know that it’s a face. If you switched a bunch of pixels around, or blurred parts of the image a little bit, I’d still know it was a face. To me it seems relevant that it’s a picture of a face, but not as relevant what all the pixels are. Does AI need to be able to do lossless compression to have understanding?
I suppose the response might be that if you have a bunch of pictures of faces, and know that they’re faces, then you ought to be able to get some mileage out of that. And even if you’re trying to remember all the pixels, there’s less information to store if you’re just diff-ing from what your face-understanding algorithm predicts is most likely. Is that it?
Well, lossless compression implies understanding. Lossy compression may or may not imply understanding.
Also, usually you can get a lossy compression algorithm from a lossless one. In image compression, the lossless method would typically be to send a scene description plus a low-entropy correction image; you can easily save bits by just skipping the correction image.
I emphasize lossless compression because it enables strong comparisons between competing methods.
Not really, at least not until you start to approach Kolmogorov complexity.
In a natural image, most of the information is low level detail that has little or no human-relevant meaning: stuff like textures, background, lighting properties, minuscule shape details, lens artifacts, lossy compression artifacts (if the image was crawled from the Internet it was probably a JPEG originally), and so on.
Lots of this detail is highly redundant and/or can be well modeled by priors, therefore a lossless compression algorithm could be very good at finding an efficient encoding of it.
A typical image used in machine learning contests is 256 x 256 x 3 x 8 =~ 1.57 million bits. How many bits of meaningful information (*) could it possibly contain? 10? 100? 1000?
Whatever the number is, the amount of non-meaningful information certainly dominates, therefore an efficient lossless compression algorithm could obtain an extremely good compression ratio without compressing and thus understanding any amount of meaningful information.
(* consider meaningful information of an image as the number of yes-or-no questions about the image that a human could be normally interested in and would be able to answer by looking at the image, where for each question the probability of the answer being true is approximately 50% over the data set, and the set of question is designed in a way that allows a human to know as much as possible by asking the least number of questions, e.g. something like a 20 questions game.)
I agree with your general point that working on lossless compression requires the researcher to pay attention to details that most people would consider meaningless or irrelevant. In my own text compression work, I have to pay a lot of attention to things like capitalization, comma placement, the difference between Unicode quote characters, etc etc. However, I have three responses to this as a critique of the research program:
The first response is to say that nothing is truly irrelevant. Or, equivalently, the vision system should not attempt to make the relevance distinction. Details that are irrelevant in everyday tasks might suddenly become very relevant in a crime scene investigation (where did this shadow at the edge of the image come from...?). Also, even if a detail is irrelevant at the top level, it might be relevant in the interpretation process; certainly shadowing is very important in the human visual system.
The second response is that while it is difficult and time-consuming to worry about details, this is a small price to pay for the overall goal of objectivity and methodological rigor. Human science has always required a large amount of tedious lab work and unglamorous experimental work.
The third response is to say that even if some phenomenon is considered irrelevant by “end users”, scientists are interested in understanding reality for its own sake, not for the sake of applications. So pure vision scientists should be very interested in, say, categorizing textures, modeling shadows and lighting, and lens artifacts (Actually, in my interactions with computer graphics people, I have found this exact tendency).
By your definition of meaningful information, it’s not actually clear that a strong lossless compressor wouldn’t discover and encode that meaningful information.
For example the presence of a face in an image is presumably meaningful information. From a compression point of view, the presence of a face and it’s approximate pose is also information that has a very large impact on lower level feature coding, in that spending say 100 bits to represent the face and it’s pose could save 10x as many bits in the lowest levels. Some purely unsupervised learning systems—such as sparse coding for example or RBMs—do tend to find high level features that correspond to objects (meaningful information).
Of course that does not imply that training using UL compression criteria is the best way to recognize any particular features/objects.
It could, but also it could not. My point is that compression ratio (that is, average log-likelihood of the data under the model) is not a good proxy for “understanding” since it can be optimized to a very large extent without modeling “meaningful” information.
Yes, good compression can be achieved without deep understanding. But a compressor with deep understanding will ultimately achieve better compression. For example, you can get good text compression results with a simple bigram or trigram model, but eventually a sophisticated grammar-based model will outperform the Ngram approach.
Huh? Understanding by whom? What exactly does the zip compressor understand?