Okay, I understand. The problem with fundamental microstates is that they only really make sense if they are possible worlds, and possible worlds bring their own problems.
One is: we can gesture at them, but we can’t grasp them. They are too big, they each describe a whole world. We can grasp the proposition that snow is white, but not the equivalent disjunction of all the possible worlds where snow is white. So we can’t use then for anything psychological like subjective Bayesianism. But maybe that’s not your goal anyway.
A more general problem is that there are infinitely many possible worlds. There are even infinitely many where snow is white. This means it is unclear how we should define a uniform probability distribution over them. Naively, if 1∞ is 0, their probabilities do not sum to 1, and if it is larger than 0, they sum to infinity. Either option would violate the probability axioms.
Warning: long and possibly unhelpful tangent ahead
Wittgenstein’s solution for this and other problems (in the Tractatus) was to ignore possible worlds and instead regard “atomic propositions” as basic. Each proposition is assumed to be equivalent to a finite logical combination of such atomic propositions, where logical combination means propositional logic (i.e. with connectives like not, and, or, but without quantifiers). Then the a priori probability of a proposition is defined as the rows in its truth table where the proposition is true divided by the total number of rows. For example, for a and b atomic, the proposition a∨b has probability 3⁄4, while a∧b has probability 1/4: The disjunction has three out of four possible truth-makers - (true, true), (true, false), (false, true), while the conjunction has only one - (true, true).
This definition in terms of the ratio of true rows in the “atomicized” truth-table is equivalent to the assumption that all atomic propositions have probability 1⁄2 and that they are all probabilistically independent.
Wittgenstein did not do it, but we can then also definite a measure of information content (or surprisal, or entropy, or whatever we want to call it) of propositions, in the following way:
Each atomic proposition has information content 1.
The information content of the conjunction of two atomic propositions is additive.
The information content of a tautology is 0.
So for a conjunction of atomic propositions with length n the information content of that conjunction is n (1+1+1+...=n), while its probability is 2−n (1/2×1/2×1/2×...=2−n). Generalizing this to arbitrary (i.e. possibly non-atomic) propositions A, the relation between probability p and information content i is
2−i(A)=p(A)
or, equivalently,
i(A)=−log2p(A).
Now that formula sure looks familiar!
The advantage of Wittgenstein’s approach is that we can assign an a priori probability distribution to propositions without having to assume a uniform probability distribution over possible worlds. It is assumed that each proposition is only a finite logical combination of atomic propositions, which would avoid problems with infinity. The same thing holds for information content (or “entropy” if you will).
Problem is … it is unclear what atomic propositions are. Wittgenstein did believe in them, and so did Bertrand Russell, but Wittgenstein eventually gave up the idea. To be clear, propositions expressed by sentences like “Snow is white” are not atomic in Wittgenstein’s sense. “Snow is white” is not probabilistically independent of “Snow is green”, and it doesn’t necessarily seem to have a priori probability 1⁄2. Moreover, the restriction to propositional logic is problematic. If we assume quantifiers, Wittgenstein suggested that we interpret the universal quantifier “all” as a possibly infinite conjunction of atomic propositions, and the existential quantifier “some” as a possibly infinite disjunction of atomic propositions. But that leads again to problems with infinity. It would always give the former probability 0 and the latter probability 1.
So logical atomism may be just as dead an end as possible worlds, perhaps worse. But it is somewhat interesting to note that approaches like algorithmic complexity have similar issues. We may want to assign a string of bits a probability or a complexity (an entropy? an information content?), but we may also want to say that some such string corresponds to a proposition, e.g. a hypothesis we are interested in. There is some superficial way of associating a binary string with propositional formulas, by interpreting e.g.1001 as a conjunction a∧¬b∧¬c∧d. But there likewise seems to be no room for quantifiers in this interpretation.
I guess a question is what you want to do with your entropy theory. Personally I would like to find some formalization of Ockham’s razor which is applicable to Bayesianism. Here the problems mentioned above appear fatal. Maybe for your purposes the issues aren’t as bad though?
Okay, I understand. The problem with fundamental microstates is that they only really make sense if they are possible worlds, and possible worlds bring their own problems.
One is: we can gesture at them, but we can’t grasp them. They are too big, they each describe a whole world. We can grasp the proposition that snow is white, but not the equivalent disjunction of all the possible worlds where snow is white. So we can’t use then for anything psychological like subjective Bayesianism. But maybe that’s not your goal anyway.
A more general problem is that there are infinitely many possible worlds. There are even infinitely many where snow is white. This means it is unclear how we should define a uniform probability distribution over them. Naively, if 1∞ is 0, their probabilities do not sum to 1, and if it is larger than 0, they sum to infinity. Either option would violate the probability axioms.
Warning: long and possibly unhelpful tangent ahead
Wittgenstein’s solution for this and other problems (in the Tractatus) was to ignore possible worlds and instead regard “atomic propositions” as basic. Each proposition is assumed to be equivalent to a finite logical combination of such atomic propositions, where logical combination means propositional logic (i.e. with connectives like not, and, or, but without quantifiers). Then the a priori probability of a proposition is defined as the rows in its truth table where the proposition is true divided by the total number of rows. For example, for a and b atomic, the proposition a∨b has probability 3⁄4, while a∧b has probability 1/4: The disjunction has three out of four possible truth-makers - (true, true), (true, false), (false, true), while the conjunction has only one - (true, true).
This definition in terms of the ratio of true rows in the “atomicized” truth-table is equivalent to the assumption that all atomic propositions have probability 1⁄2 and that they are all probabilistically independent.
Wittgenstein did not do it, but we can then also definite a measure of information content (or surprisal, or entropy, or whatever we want to call it) of propositions, in the following way:
Each atomic proposition has information content 1.
The information content of the conjunction of two atomic propositions is additive.
The information content of a tautology is 0.
So for a conjunction of atomic propositions with length n the information content of that conjunction is n (1+1+1+...=n), while its probability is 2−n (1/2×1/2×1/2×...=2−n). Generalizing this to arbitrary (i.e. possibly non-atomic) propositions A, the relation between probability p and information content i is 2−i(A)=p(A) or, equivalently, i(A)=−log2p(A). Now that formula sure looks familiar!
The advantage of Wittgenstein’s approach is that we can assign an a priori probability distribution to propositions without having to assume a uniform probability distribution over possible worlds. It is assumed that each proposition is only a finite logical combination of atomic propositions, which would avoid problems with infinity. The same thing holds for information content (or “entropy” if you will).
Problem is … it is unclear what atomic propositions are. Wittgenstein did believe in them, and so did Bertrand Russell, but Wittgenstein eventually gave up the idea. To be clear, propositions expressed by sentences like “Snow is white” are not atomic in Wittgenstein’s sense. “Snow is white” is not probabilistically independent of “Snow is green”, and it doesn’t necessarily seem to have a priori probability 1⁄2. Moreover, the restriction to propositional logic is problematic. If we assume quantifiers, Wittgenstein suggested that we interpret the universal quantifier “all” as a possibly infinite conjunction of atomic propositions, and the existential quantifier “some” as a possibly infinite disjunction of atomic propositions. But that leads again to problems with infinity. It would always give the former probability 0 and the latter probability 1.
So logical atomism may be just as dead an end as possible worlds, perhaps worse. But it is somewhat interesting to note that approaches like algorithmic complexity have similar issues. We may want to assign a string of bits a probability or a complexity (an entropy? an information content?), but we may also want to say that some such string corresponds to a proposition, e.g. a hypothesis we are interested in. There is some superficial way of associating a binary string with propositional formulas, by interpreting e.g.1001 as a conjunction a∧¬b∧¬c∧d. But there likewise seems to be no room for quantifiers in this interpretation.
I guess a question is what you want to do with your entropy theory. Personally I would like to find some formalization of Ockham’s razor which is applicable to Bayesianism. Here the problems mentioned above appear fatal. Maybe for your purposes the issues aren’t as bad though?