By your definition of meaningful information, it’s not actually clear that a strong lossless compressor wouldn’t discover and encode that meaningful information.
For example the presence of a face in an image is presumably meaningful information. From a compression point of view, the presence of a face and it’s approximate pose is also information that has a very large impact on lower level feature coding, in that spending say 100 bits to represent the face and it’s pose could save 10x as many bits in the lowest levels. Some purely unsupervised learning systems—such as sparse coding for example or RBMs—do tend to find high level features that correspond to objects (meaningful information).
Of course that does not imply that training using UL compression criteria is the best way to recognize any particular features/objects.
By your definition of meaningful information, it’s not actually clear that a strong lossless compressor wouldn’t discover and encode that meaningful information.
It could, but also it could not. My point is that compression ratio (that is, average log-likelihood of the data under the model) is not a good proxy for “understanding” since it can be optimized to a very large extent without modeling “meaningful” information.
Yes, good compression can be achieved without deep understanding. But a compressor with deep understanding will ultimately achieve better compression. For example, you can get good text compression results with a simple bigram or trigram model, but eventually a sophisticated grammar-based model will outperform the Ngram approach.
By your definition of meaningful information, it’s not actually clear that a strong lossless compressor wouldn’t discover and encode that meaningful information.
For example the presence of a face in an image is presumably meaningful information. From a compression point of view, the presence of a face and it’s approximate pose is also information that has a very large impact on lower level feature coding, in that spending say 100 bits to represent the face and it’s pose could save 10x as many bits in the lowest levels. Some purely unsupervised learning systems—such as sparse coding for example or RBMs—do tend to find high level features that correspond to objects (meaningful information).
Of course that does not imply that training using UL compression criteria is the best way to recognize any particular features/objects.
It could, but also it could not. My point is that compression ratio (that is, average log-likelihood of the data under the model) is not a good proxy for “understanding” since it can be optimized to a very large extent without modeling “meaningful” information.
Yes, good compression can be achieved without deep understanding. But a compressor with deep understanding will ultimately achieve better compression. For example, you can get good text compression results with a simple bigram or trigram model, but eventually a sophisticated grammar-based model will outperform the Ngram approach.