I can’t explain the whole philosophy here, but basically the idea is: you have two theories, A and B. You instantiate them as lossless data compressors, and invoke the compressors on the dataset. The one that produces a shorter net codelength (including the length of the compressor program itself) is superior. In practice the rival theories will probably be very similar and produce different predictions (= probability distributions over observational outcomes) only on small regions of the dataset.
Lossless data compression is a highly rigorous evaluation principle. Many theories are simply not well-specified enough to be built into compressors; these theories, I say (reformulating Popper and Yudkowsky), should not be considered scientific. If the compressor implementation contains any bugs, these bugs will immediately appear when the decoded data fails to agree exactly with the original data. Finally, if the theory is scientific and the implementation is correct, it still remains to be seen if the theory is empirically accurate, which is required for lossless data compression in the face of the domain’s No Free Lunch theorem.
So say you and I have two rival theories of black hole dynamics. If the theories are different in a scientifically meaningful way, they must make different predictions about some data that could be observed. That means the compressors corresponding to our theories will assign different codelengths to some observations in the dataset. If your theory is more accurate, it will achieve shorter codelengths overall. This could happen by, say, your theory properly accounting for the velocity dispersion of galaxies under the effect of dark matter. Or it could happen by my theory being hit by a big Black Swan penalty because it cannot explain an astronomical jet coming from a black hole.
What about the fact that the best compression algorithm may be insanely expensive to run? We know the math that describes the behavior of quarks, which is to say, we can in principle generate the results of all possible experiments with quarks by solving a few equations. However doing computations with the theory is extremely expensive and it takes something like 10^15 floating point operations to compute, say, some basic properties of the proton to 1% accuracy.
Good point. My answer is: yes, we have to accept a speed/accuracy tradeoff. That doesn’t seem like such a disaster in practice.
Some people, primarily Matt Mahoney, have actually organized data compression contests similar to what I’m advocating. Mahoney’s solution is just to impose a certain time limit that is reasonable but arbitrary. In the future, researchers could develop a spectrum of theories, each of which achieves a non-dominated position on a speed/compression curve. Unless something Very Strange happened, each faster/less accurate theory would be related to its slower/more accurate cousin by a standard suite of approximations. (It would be strange—but interesting—if you could get an accurate and fast theory by doing a nonstandard approximation or introducing some kind of new concept).
I don’t follow… Can you elaborate on how some specific form of compression could do that?
I can’t explain the whole philosophy here, but basically the idea is: you have two theories, A and B. You instantiate them as lossless data compressors, and invoke the compressors on the dataset. The one that produces a shorter net codelength (including the length of the compressor program itself) is superior. In practice the rival theories will probably be very similar and produce different predictions (= probability distributions over observational outcomes) only on small regions of the dataset.
Lossless data compression is a highly rigorous evaluation principle. Many theories are simply not well-specified enough to be built into compressors; these theories, I say (reformulating Popper and Yudkowsky), should not be considered scientific. If the compressor implementation contains any bugs, these bugs will immediately appear when the decoded data fails to agree exactly with the original data. Finally, if the theory is scientific and the implementation is correct, it still remains to be seen if the theory is empirically accurate, which is required for lossless data compression in the face of the domain’s No Free Lunch theorem.
So say you and I have two rival theories of black hole dynamics. If the theories are different in a scientifically meaningful way, they must make different predictions about some data that could be observed. That means the compressors corresponding to our theories will assign different codelengths to some observations in the dataset. If your theory is more accurate, it will achieve shorter codelengths overall. This could happen by, say, your theory properly accounting for the velocity dispersion of galaxies under the effect of dark matter. Or it could happen by my theory being hit by a big Black Swan penalty because it cannot explain an astronomical jet coming from a black hole.
What about the fact that the best compression algorithm may be insanely expensive to run? We know the math that describes the behavior of quarks, which is to say, we can in principle generate the results of all possible experiments with quarks by solving a few equations. However doing computations with the theory is extremely expensive and it takes something like 10^15 floating point operations to compute, say, some basic properties of the proton to 1% accuracy.
Good point. My answer is: yes, we have to accept a speed/accuracy tradeoff. That doesn’t seem like such a disaster in practice.
Some people, primarily Matt Mahoney, have actually organized data compression contests similar to what I’m advocating. Mahoney’s solution is just to impose a certain time limit that is reasonable but arbitrary. In the future, researchers could develop a spectrum of theories, each of which achieves a non-dominated position on a speed/compression curve. Unless something Very Strange happened, each faster/less accurate theory would be related to its slower/more accurate cousin by a standard suite of approximations. (It would be strange—but interesting—if you could get an accurate and fast theory by doing a nonstandard approximation or introducing some kind of new concept).