I wonder if it helps to arrange K-information in layers. You could start with “Almost all crows are black”, and then add footnotes for how rare white crows actually are, what causes them, how complete we think our information about crow color distribution is and why, and possibly some factors I haven’t thought of.
Layering or modularizing the hypothesis: Of course, you can do this, and you typically do do this. But, layering doesn’t typically change the total quantity of K-information. A complex hypothesis still has a lot of K-information whether you present it as neatly layered or just jumbled together. Which brings us to the issue of just why we bother calculating the K-information content of a hypothesis in the first place.
There is a notion, mentioned in Jaynes and also in another thread active right now, that the K-information content of a hypothesis is directly related to the prior probability that ought to be attached to a hypothesis (in the absence of (or prior to) empirical evidence). So, it seems to me that the interesting thing about your layering suggestion is how the layering should tie in to the Bayesian inference machinery which we use to evaluate theories.
For example, suppose we have a hypothesis which, based on evidence so far, has a subjective “probability of correctness” of, say 0.5. Then we get a new bit of evidence. We observe a white (albino) crow, for example. Doing standard Bayesian updating, the probability of our hypothesis drops to 0.001, say. So we decide to try to resurrect our hypothesis by adding another layer. Trouble is, that we have just increased the K-complexity of the hypothesis, and that ought to hurt us in our original “no-data” prior. Trouble is, we already have data. Lots of it. So is there some algebraic trick which lets us add that new layer to the hypothesis without going back to evidential square one?
K-information is about communicating to “someone”—do you compute the amount of K-information for the most receptive person you’re communicating with, or do you have a different amount for each layer of detail?
Actually, you might have a tree structure, not just layers—the prevalence of white crows in time and space is a different branch than the explanation of how crows can be white.
K-information is about communicating to “someone”—do you compute the amount of K-information for the most receptive person you’re communicating with, or do you have a different amount for each layer of detail?
A very interesting question. Especially when you consider the analogy with canon:Kolmogorov. Here we have an ambiguity as to what person we communicate to. There, the ambiguity was regarding exactly what model of universal Turing machine we were programming. And there, there was a theorem to the effect that the differences among Turing machines aren’t all that big. Do we have a similar theorem here, for the differences among people—seen as universal programmable epistemic engines.
Trouble is, we already have data. Lots of it. So is there some algebraic trick which lets us add that new layer to the hypothesis without going back to evidential square one?
Bayesian updating is timeless. It doesn’t care whether you observed the data before or after you wrote the hypothesis.
So, it sounds like you are suggesting that we can back out all that data, change our hypothesis and prior, and then read the data back in. In theory, yes. But sometimes we don’t even remember the data that brought us to where we are now. Hence the desirability of a trick. Is there an updating-with-new-hypothesis rule to match Bayes’s updating-with-new-evidence rule?
I wonder if it helps to arrange K-information in layers. You could start with “Almost all crows are black”, and then add footnotes for how rare white crows actually are, what causes them, how complete we think our information about crow color distribution is and why, and possibly some factors I haven’t thought of.
Layering or modularizing the hypothesis: Of course, you can do this, and you typically do do this. But, layering doesn’t typically change the total quantity of K-information. A complex hypothesis still has a lot of K-information whether you present it as neatly layered or just jumbled together. Which brings us to the issue of just why we bother calculating the K-information content of a hypothesis in the first place.
There is a notion, mentioned in Jaynes and also in another thread active right now, that the K-information content of a hypothesis is directly related to the prior probability that ought to be attached to a hypothesis (in the absence of (or prior to) empirical evidence). So, it seems to me that the interesting thing about your layering suggestion is how the layering should tie in to the Bayesian inference machinery which we use to evaluate theories.
For example, suppose we have a hypothesis which, based on evidence so far, has a subjective “probability of correctness” of, say 0.5. Then we get a new bit of evidence. We observe a white (albino) crow, for example. Doing standard Bayesian updating, the probability of our hypothesis drops to 0.001, say. So we decide to try to resurrect our hypothesis by adding another layer. Trouble is, that we have just increased the K-complexity of the hypothesis, and that ought to hurt us in our original “no-data” prior. Trouble is, we already have data. Lots of it. So is there some algebraic trick which lets us add that new layer to the hypothesis without going back to evidential square one?
K-information is about communicating to “someone”—do you compute the amount of K-information for the most receptive person you’re communicating with, or do you have a different amount for each layer of detail?
Actually, you might have a tree structure, not just layers—the prevalence of white crows in time and space is a different branch than the explanation of how crows can be white.
A very interesting question. Especially when you consider the analogy with canon:Kolmogorov. Here we have an ambiguity as to what person we communicate to. There, the ambiguity was regarding exactly what model of universal Turing machine we were programming. And there, there was a theorem to the effect that the differences among Turing machines aren’t all that big. Do we have a similar theorem here, for the differences among people—seen as universal programmable epistemic engines.
Bayesian updating is timeless. It doesn’t care whether you observed the data before or after you wrote the hypothesis.
So, it sounds like you are suggesting that we can back out all that data, change our hypothesis and prior, and then read the data back in. In theory, yes. But sometimes we don’t even remember the data that brought us to where we are now. Hence the desirability of a trick. Is there an updating-with-new-hypothesis rule to match Bayes’s updating-with-new-evidence rule?