Does compression research have any analogue of training behavior in ML research?
Watched a talk by Ilya Sutskever about generalization, and during the course of it he covered the equivalence of prediction and compression, and then further into maximum likelihood. The conversation was motivated by how thinking about this was a helpful perspective for him since 2016 or so.
In the talk he makes an analogy to Kolmogorov complexity, and then extends it so that conditional Kolmogorov complexity can be thought of as Kolmogorov compression and uses that when talking about why unstructured learning might work.
At 38:30 there’s a question/comment about where the Kolmogorov analogy breaks down, which was that in Kolmogorov the order of the data doesn’t matter but in training neural networks it does, and as a result they have training behaviors.
Is this something that has ever shown up in compression research, about which I know almost nothing? I realize in the typical case it is mostly for things that we can read all of simply enough, like files on a computer, and compression is mostly for trading off resources like storage or bandwidth against time or CPU, but that’s because we have really well developed applied cases. Does compression research contain work on “general compressor” algorithms the way a transformer is a general predictor? Do compression algorithms experience compression behaviors the way a neural network experiences training behaviors?
Does compression research have any analogue of training behavior in ML research?
Watched a talk by Ilya Sutskever about generalization, and during the course of it he covered the equivalence of prediction and compression, and then further into maximum likelihood. The conversation was motivated by how thinking about this was a helpful perspective for him since 2016 or so.
In the talk he makes an analogy to Kolmogorov complexity, and then extends it so that conditional Kolmogorov complexity can be thought of as Kolmogorov compression and uses that when talking about why unstructured learning might work.
At 38:30 there’s a question/comment about where the Kolmogorov analogy breaks down, which was that in Kolmogorov the order of the data doesn’t matter but in training neural networks it does, and as a result they have training behaviors.
Is this something that has ever shown up in compression research, about which I know almost nothing? I realize in the typical case it is mostly for things that we can read all of simply enough, like files on a computer, and compression is mostly for trading off resources like storage or bandwidth against time or CPU, but that’s because we have really well developed applied cases. Does compression research contain work on “general compressor” algorithms the way a transformer is a general predictor? Do compression algorithms experience compression behaviors the way a neural network experiences training behaviors?
This seems worth a search or two.