By your definition of meaningful information, it’s not actually clear that a strong lossless compressor wouldn’t discover and encode that meaningful information.
It could, but also it could not. My point is that compression ratio (that is, average log-likelihood of the data under the model) is not a good proxy for “understanding” since it can be optimized to a very large extent without modeling “meaningful” information.
Yes, good compression can be achieved without deep understanding. But a compressor with deep understanding will ultimately achieve better compression. For example, you can get good text compression results with a simple bigram or trigram model, but eventually a sophisticated grammar-based model will outperform the Ngram approach.
It could, but also it could not. My point is that compression ratio (that is, average log-likelihood of the data under the model) is not a good proxy for “understanding” since it can be optimized to a very large extent without modeling “meaningful” information.
Yes, good compression can be achieved without deep understanding. But a compressor with deep understanding will ultimately achieve better compression. For example, you can get good text compression results with a simple bigram or trigram model, but eventually a sophisticated grammar-based model will outperform the Ngram approach.