Tyrrell_McAllister comments on Take heed, for it is a trap

Tyrrell_McAllister 21 Aug 2011 21:19 UTC
0 points
It’s pretty easy to see how it would work if there are only a finite number of hypotheses, say n: in that case, Ω is basically just the collection of binary strings of length n (assuming the hypothesis space is carved up appropriately), and each map V_A is evaluation at a particular coordinate. Sure enough, at each coordinate, half the elements of Ω evaluate to 1, and half to 0 !

Here are a few problems that I have with this approach:
1. This approach makes your focus on the case where the hypotheses A is “unspecified” seem very mysterious. Under this model, we have P(V_A = True) = 0.5 even for a hypothesis A that is entirely specified, down to its last bit. So why all the talk about how a true prior probability for A needs to be based on complete ignorance even of the content of A? Under this model, even if you grant complete knowledge of A, you’re still assigning it a prior probability of 0.5. Much of the push-back you got seemed to be around the meaningfulness of assigning a probability to an unspecified hypothesis. But you could have sidestepped that issue and still established the claim in the OP under this model, because here the claim is true even of specified hypotheses. (However, you would still need to justify that this model is how we ought to think about Bayesian updating. My remaining concerns address this.)
2. By having Ω be the collection of all bit strings of length n, you’ve dropped the condition that the maps v respect logical operations. This is equivalent to dropping the requirement that the possible worlds be logically possible. E.g., your sample space would include maps v such that v(A) = v(~A) for some hypothesis A. But, maybe you figure that this is a feature, not a bug, because knowledge about logical consistency is something that the agent shouldn’t yet have in its prior state of complete ignorance. But then …
3. … If the agent starts out as logically ignorant, how can it work with only a finite number of hypotheses? It doesn’t start out knowing that A, A&A, A&A&A, etc., can all be collapsed down to just A, and that’s infinitely many hypotheses right there. But maybe you mean for the n hypotheses to be “atomic” propositions, each represented by a distinct proposition letter A, B, C, …, with no logical dependencies among them, and all other hypotheses built up out of these “atoms” with logical connectives. It’s not clear to me how you would handle quantifiers this way, but set that aside. The more important problem is …
4. … How do you ever accomplish any nontrivial Bayesian updating under this model? For suppose that you learn somehow that A is true. Now, conditioned on A, what is the probability of B? Still 0.5. Even if you learn the truth value of every hypothesis except B, you still would assign probability 0.5 to B.
More generally, one could imagine a probability distribution on the hypothesis space controlling the “weighting” of elements of Ω. For instance, if hypothesis #6 gets its probability raised, then those mappings v in Ω such that v(6) = 1 would be weighted more than those such that v(6) = 0. I haven’t checked that this type of arrangement is actually possible, but something like it ought to be.

Is this a description of what the prior distribution might be like? Or is it a description of what updating on the prior distribution might yield?
1. If you meant the former, wouldn’t you lose your justification for claiming that the prior probability of an unspecified hypothesis is exactly 0.5? For, couldn’t it be the case that most hypotheses are true in most worlds (counted by weight), so that an unknown random hypothesis would be more likely to be true than not?
2. If you meant the latter, I would like to see how this updating would work in more detail. I especially would like to see how Problem 4 above could be overcome.