No Good Logical Conditional Probability
Fix a theory over a language . A coherent probability function is one that satisfies laws of probability theory, each coherent probability function represents a probability distribution on complete logical extensions of .
One of many equivalent definitions of coherence is that is coherent if whenever can prove that exactly one of is true.
Another very basic desirable property is that only when is provable. There have been many proposals of specific coherent probability assignments that all satisfy this basic requirement. Many satisfy stronger requirements that give bounds on how far is from 1 when is not provable.
In this post, I modify the framework slightly to instead talk about conditional probability. Consider a function which takes in a consistent theory and a sentence , and outputs a number , which represents the conditional probability of given .
We say that is coherent if:
-
whenever can prove that exactly one of is true, and
-
-
If proves every sentence in , then .
Theorem: There is no coherent conditional probability function such that only when proves .
Proof:
This proof will use the notation of log odds to make things simpler.
Let be a coherent conditional probability function. Fix a sentence which is neither provable nor disprovable from the empty theory. Construct an infinite sequences of theories as follows:
-
is the empty theory.
-
To construct , choose a sentence such that neither nor are provable in . If , then let . Otherwise, let .
Fix an , and without loss of generality, assume . Since is coherent we have In particular, this means that .
Observe that , and . Therefore, , so .
In the language of log odds, this means that .
Let be the union of all the . Note that by the third condition of coherence, for all , so for all
Consider and . These numbers cannot both be finite, since . Therefore, at least one of and must be 0 or 1. However neither nor prove or disprove , so this means that assigns conditional probability 1 to some statement that cannot be proven.
Open Problem: Does this theorem still hold if we leave condition 3 out of the definition of coherence?
Charlie: Your proposal to remove condition 3 works when T is a finite theory, but you cannot do that when T is infinite. Indeed, we use 3 for the infinite extension form Tn to T∞. I suspect that you cannot remove condition 3.
Is the idea that the proof necessary to use 1 is of infinite length, and you want your logic to be finitary? Hm. This seems odd, because P(s|T∞) is in some sense already a function with an infinitely long argument. How do you feel about using 2 in the form of P(T∞|T0)=P(Tn|T0)⋅[someprobability], therefore P(T∞|T0)≤P(Tn|T0), which has the same amount of argument as P(s|T∞)? I’m confused about at least one thing.
Also, is there some reason you prefer not to reply using the button below the comment?
It’s interesting that this is basically the opposite of the Gaifman condition—clearly there are conflicting intuitions about what makes a ‘good’ conditional logical probability.
On the open problem; In order to prove 3 from 2, all you need is that P(s∧T|R)=P(s|R) when s proves T − 3 follows from 2 if you do that substitution, and then divide by P(T|R), which is less than or equal 1 (this may assume an extra commonsense axiom that probabilities are positive).
Now consider applying rule 1 to P(s∧T|R), T proven by s. R proves that only one of s∧T,¬s∧T,¬s∧¬T is true, and also proves that only one of s,¬s∧T,¬s∧¬T is true. Thus 3 is derivable from 1 and 2.
This is interesting! I would dispute, though, that a good logical conditional probability must be able to condition on arbitrary, likely-non-r.e. sets of sentences.
Hm; we could add an uninterpreted predicate symbol Q(n) to the language of arithmetic, and let s≡Q(0) and rn≡Q(¯¯¯¯¯¯¯¯¯¯¯¯¯n+1). Then, it seems like the only barrier to recursive enumerability of T∞ is that P’s opinions about Q(⋅) aren’t computable; this seems worrying in practice, since it seems certain that we would like logical uncertainty to be able to reason about the values of computations that use more resources than we use to compute our own probability estimates. But on the other hand, all of this makes this sound like an issue of self-reference, which is its own can of worms (once we have a computable process assigning probabilities to the value of computations, we can consider the sentence saying “I’m assigned probability <12” etc.).
Nice! Basically, it looks like you construct a theory by assembling an infinite quantity of what the prior takes as evidence about s, so that either the prior or the posterior has to take the most extreme odds on s. It’s pretty intuitive in that light, and so I’m not dismayed that the “0 and 1 are not probabilities” property can’t hold when conditioned on arbitrary theories.
Important typo: P assigns conditional probability 1 to some statement that cannot be proven.