Andy_McKenzie comments on The Truth Points to Itself, Part I

Andy_McKenzie 1 Jun 2012 16:55 UTC
0 points
Upvoted. I think

This drive maximizes interestingness, the ﬁrst derivative of subjective beauty or compressibility, that is, the steepness of the learning curve.

is a cool way to quantify interestingness. I wonder if future posts might compare this to other possible measures of interestingness, such as Gelman’s Kullback-Leibler divergence between the original and updated model (which is therefore a measure of the information entropy between the two).

ETA: Nevermind, I just read the paper you linked to and the author mentioned it:

Note that the concepts of Huffman coding [28] and relative entropy between prior and posterior immediately translate into a measure of learning progress reﬂecting the number of saved bits—a measure of improved data compression. Note also, however, that the naive probabilistic approach to data compression is unable to discover more general types of algorithmic compressibility. For example, the decimal expansion of π looks random and incompressible but isn’t: there is a very short algorithm computing all of π, yet any ﬁnite sequence of digits will occur in π’s expansion as frequently as expected if π were truly random, that is, no simple statistical learner will outperform random guessing at predicting the next digit from a limited time window of previous digits. More general program search techniques are necessary to extract the underlying algorithmic regularity.

Still, I wonder whether KL divergence (or another entropic measure) is an improvement over pure compressibility in some settings, which would imply a sort of trade-off.