If anyone is curious about regular archivers, Joyce became less compressible throughout his life. The compression ratios (bytes per 100 characters) for Dubliners, Portrait, Ulysses, and Wake are: by gzip −9: 38, 38, 42, 47; for paq8l −7: 24, 24, 26, 33. LZMA and PPMd interpolate these numbers in unsurprising ways. Dubliners and Portrait seem about as compressible as other fiction in English.
Of course, I performed these calculations using a server in Australia, where Finnegans Wake is in the public domain.
This comment was prompted by Finnegans Wake seeming like an odd choice of a novel. War and Peace is a more prototypical novel, so you probably didn’t mean anything by the choice.
Thanks for the pointer to paq8l. And it won the Hutter Prize too! That’s funny because my post can be viewed as a comment on the relevance of the Hutter Prize.
Finnegans Wake was just my first idea for “novel that’s hard to compress”.
If anyone is curious about regular archivers, Joyce became less compressible throughout his life. The compression ratios (bytes per 100 characters) for Dubliners, Portrait, Ulysses, and Wake are: by gzip −9: 38, 38, 42, 47; for paq8l −7: 24, 24, 26, 33. LZMA and PPMd interpolate these numbers in unsurprising ways. Dubliners and Portrait seem about as compressible as other fiction in English.
Of course, I performed these calculations using a server in Australia, where Finnegans Wake is in the public domain.
This comment was prompted by Finnegans Wake seeming like an odd choice of a novel. War and Peace is a more prototypical novel, so you probably didn’t mean anything by the choice.
Can anyone suggest other hard to compress novels?
Thanks for the pointer to paq8l. And it won the Hutter Prize too! That’s funny because my post can be viewed as a comment on the relevance of the Hutter Prize.
Finnegans Wake was just my first idea for “novel that’s hard to compress”.