Has anyone done a study on redundant information in languages?
I’m just mildly curious, because a back-of-the-envelope calculation suggests that English is about 4.7x redundant—which on a side note explains how we can esiayl regnovze eevn hrriofclly msispled wrods.
(Actually, that would be an interesting experiment—remove or replace fraction x of the letters in a paragraph and see at what average x participants can no longer make a “corrected” copy.)
I’d predict that Chinese is much less redundant in its spoken form, and that I have no idea how to measure redundancy in its written form. (By stroke? By radical?)
To measure the artistic merit of texts, Kolmogorov also employed a letter-guessing method to evaluate the entropy of natural language. In information theory, entropy is a measure of uncertainty or unpredictability, corresponding to the information content of a message: the more unpredictable the message, the more information it carries. Kolmogorov turned entropy into a measure of artistic originality. His group conducted a series of experiments, showing volunteers a fragment of Russian prose or poetry and asking them to guess the next letter, then the next, and so on. Kolmogorov privately remarked that, from the viewpoint of information theory, Soviet newspapers were less informative than poetry, since political discourse employed a large number of stock phrases and was highly predictable in its content. The verses of great poets, on the other hand, were much more difficult to predict, despite the strict limitations imposed on them by the poetic form. According to Kolmogorov, this was a mark of their originality. True art was unlikely, a quality probability theory could help to measure.
The verses of great poets, on the other hand, were much more difficult to predict, despite the strict limitations imposed on them by the poetic form. According to Kolmogorov, this was a mark of their originality. True art was unlikely, a quality probability theory could help to measure.
This also happens to me with music. I enjoy “unpredictable” music more than predictable music. Knowing music theory I know which notes are supposed to be played—if a song is in a certain key—and if a note or chord isn’t predicted then it feels a bit more enjoyable. I wonder if the same technique could be applied to different genres of music with the same result, i.e. radio-friendly pop music vs non-mainstream music.
By other metrics, Joyce became less compressible throughout his life. Going closer to the original metric, you demonstrate that the title is hard to compress (especially the lack of apostrophe).
(Actually, that would be an interesting experiment—remove or replace fraction x of the letters in a paragraph and see at what average x participants can no longer make a “corrected” copy.)
Studies of this form have been done at least on the edge case where all the material removed is from the end (ie. tests of the ability of subjects to predict the next letter in an English text). I’d be interested to see your more general test but am not sure if it has been done. (Except, perhaps, as a game show).
Has anyone done a study on redundant information in languages?
I’m just mildly curious, because a back-of-the-envelope calculation suggests that English is about 4.7x redundant—which on a side note explains how we can esiayl regnovze eevn hrriofclly msispled wrods.
(Actually, that would be an interesting experiment—remove or replace fraction x of the letters in a paragraph and see at what average x participants can no longer make a “corrected” copy.)
I’d predict that Chinese is much less redundant in its spoken form, and that I have no idea how to measure redundancy in its written form. (By stroke? By radical?)
Yes, it’s been studied quite a bit by linguists. You can find some pointers in http://www.gwern.net/Notes#efficient-natural-language which may be helpful.
Thanks.
… huh. Now I’m thinking about actually doing that experiment...
I ran into another thing in that vein:
--The Man Who Invented Modern Probability—Issue 4: The Unlikely—Nautilus
This also happens to me with music. I enjoy “unpredictable” music more than predictable music. Knowing music theory I know which notes are supposed to be played—if a song is in a certain key—and if a note or chord isn’t predicted then it feels a bit more enjoyable. I wonder if the same technique could be applied to different genres of music with the same result, i.e. radio-friendly pop music vs non-mainstream music.
I wonder what that metric has to say about Finnigan’s Wake...
By other metrics, Joyce became less compressible throughout his life. Going closer to the original metric, you demonstrate that the title is hard to compress (especially the lack of apostrophe).
If you do, please post about it!
Studies of this form have been done at least on the edge case where all the material removed is from the end (ie. tests of the ability of subjects to predict the next letter in an English text). I’d be interested to see your more general test but am not sure if it has been done. (Except, perhaps, as a game show).