zero_call comments on Development of Compression Rate Method

zero_call 30 May 2010 8:02 UTC
1 point
0
This seems very well written and I’d like to complement you on that regard. I find the shaman example amusing and also very fun to read.

For Sophie, if she has a large data set, then her theory should be able to predict a data set for the same experimental configuration, and then the the two data sets would be compared. That is the obvious standard and I’m not sure why it’s not permitted here. Perhaps you were trying to emphasize Sophie’s desire to go on and test her theory on different experimental parameters, etc.

The original shaman example works very well for me, it is rather basic and doesn’t make any very unsubstantiated claims. In the next examples, however, there needs to be more elaboration on the method in which you go from theory --> data. In the post you say,

She immediately returns to her office and spends the next several weeks writing Matlab code, converting her theory into a compression algorithm. The resulting compressor is highly successful: it shrinks the corpus of experimental data from an initial size of 8.7e11 bits to an encoded size of 3.3e9 bits.

Without knowing the details of how you go from theory to compressed end product, it’s hard to say that this method makes sense. Actually, I would probably be fairly satisfied if you stopped after the second section. But when you introduce the third section, with the competition between colleagues, it implies there is some kind of unknown, nontrivial relation between fitting parameters of the theory, the theory, the compression program, the compression program data size, and the final compressed data.

It all seems pretty vague to make a conclusion like “add the compression program size and the final data size to get the final number”.