PhilGoetz comments on Fallacies of Compression

PhilGoetz 17 Dec 2017 2:10 UTC
2 points
Great post! There is also the non-discrete aspect of compression: information loss. English has, according to some dictionaries, over a million words. It’s unlikely we store most of our information in English. Probably there is some sort of dimension reduction, like PCA. There is in any case probably lossy compression. This means people with different histories will use different frequency tables for their compression, and will throw out different information when encoding a verbal statement. I think you would almost certainly find that if you measure word use frequency for different people, then cluster the word use distributions, some clusters would correspond to ideologies. The interesting question is which comes first, the ideology, or the word usage frequency (caused by different life experiences).