One interesting compression of Finnegans Wake that goes beyond a .zip file is a plot summary. Obviously it is lossy compression, but how lossy?
One way to think about it is to imagine you are given a plot summary of Finnegans Wake, and asked to reconstruct it. What additional information would you want? A reasonably extensive knowledge of the English language and its grammar, certainly. Most likely a description of Joyce’s writing style. Knowledge of human psychology and of the setting.
Obviously, we only need to include as much information as is actually used. If Finnegans Wake never contains the word “indubitably” then we don’t need its definition. Also, ideally, all of this is written in some sort of natural representation with no redundancy, rather than in English, but we can think about writing the above in English as an approximation.
Knowing the algorithm, we can then add corrections. Suppose, when we take the above information, and try to reproduce the text, we end up putting a semicolon instead of a period 600 characters in (perhaps semicolons are usually more consistent with Joyce’s style, but here he was feeling capricious). We could add a note to the effect of “600 characters: period, not semicolon”. A bunch of these notes (which don’t really take up much space) together with the information above make up our perfectly compressed string.
I cannot see how you could reconstruct a novel from a plot summary, regardless of additional information provided. Do you mean a text such that if a person read it, then read the actual Finnigan’s Wake say a year later, 95% of the time he would not notice the difference? In any case, this scheme clearly has greater K-complexity than a .zip file.
One interesting compression of Finnegans Wake that goes beyond a .zip file is a plot summary. Obviously it is lossy compression, but how lossy?
One way to think about it is to imagine you are given a plot summary of Finnegans Wake, and asked to reconstruct it. What additional information would you want? A reasonably extensive knowledge of the English language and its grammar, certainly. Most likely a description of Joyce’s writing style. Knowledge of human psychology and of the setting.
Obviously, we only need to include as much information as is actually used. If Finnegans Wake never contains the word “indubitably” then we don’t need its definition. Also, ideally, all of this is written in some sort of natural representation with no redundancy, rather than in English, but we can think about writing the above in English as an approximation.
Knowing the algorithm, we can then add corrections. Suppose, when we take the above information, and try to reproduce the text, we end up putting a semicolon instead of a period 600 characters in (perhaps semicolons are usually more consistent with Joyce’s style, but here he was feeling capricious). We could add a note to the effect of “600 characters: period, not semicolon”. A bunch of these notes (which don’t really take up much space) together with the information above make up our perfectly compressed string.
I cannot see how you could reconstruct a novel from a plot summary, regardless of additional information provided. Do you mean a text such that if a person read it, then read the actual Finnigan’s Wake say a year later, 95% of the time he would not notice the difference? In any case, this scheme clearly has greater K-complexity than a .zip file.