No Good Logical Conditional Probability

Scott Garrabrant2 Sep 2015 3:25 UTC

LW: 4 AF: 3

Fix a theory $T$ over a language $L$ . A coherent probability function is one that satisfies laws of probability theory, each coherent probability function represents a probability distribution on complete logical extensions of $T$ .

One of many equivalent definitions of coherence is that $P$ is coherent if $P (s_{1}) + P (s_{2}) + \dots + P (s_{k}) = 1$ whenever $T$ can prove that exactly one of $s_{1}, \dots, s_{k}$ is true.

Another very basic desirable property is that $P (s) = 1$ only when $s$ is provable. There have been many proposals of specific coherent probability assignments that all satisfy this basic requirement. Many satisfy stronger requirements that give bounds on how far $P (s)$ is from 1 when $s$ is not provable.

In this post, I modify the framework slightly to instead talk about conditional probability. Consider a function $P$ which takes in a consistent theory $T$ and a sentence $s$ , and outputs a number $P (s | T) \in [0, 1]$ , which represents the conditional probability of $s$ given $T$ .

We say that $P$ is coherent if:

$P (s_{1} | T) + P (s_{2} | T) + \dots + P (s_{k} | T) = 1$ whenever $T$ can prove that exactly one of $s_{1}, \dots, s_{k}$ is true, and
$P (s \land r | T) = P (r | T) \cdot P (s | T \cup {r}) .$
If $s$ proves every sentence in $T$ , then $P (s | R \cup T) \geq P (s | R)$ .

Theorem: There is no coherent conditional probability function $P$ such that $P (s | T) = 1$ only when $T$ proves $s$ .

Proof:

This proof will use the notation of log odds $ℓ (p) = {log}_{2} (\frac{p}{1 - p})$ to make things simpler.

Let $P$ be a coherent conditional probability function. Fix a sentence $s$ which is neither provable nor disprovable from the empty theory. Construct an infinite sequences of theories as follows:

$T_{0}$ is the empty theory.
To construct $T_{n + 1}$ , choose a sentence $r_{n}$ such that neither $s \to r_{n}$ nor $s \to \neg r_{n}$ are provable in $T_{n}$ . If $P (s \land r_{n} | T_{n}) \leq P (s \land \neg r_{n} | T_{n})$ , then let $T_{n + 1} = T_{n} \cup {s \to r_{n}}$ . Otherwise, let $T_{n + 1} = T_{n} \cup {s \to \neg r_{n}}$ .

Fix an $n$ , and without loss of generality, assume $P (s \land r_{n} | T_{n}) \leq P (s \land \neg r_{n} | T_{n})$ . Since $P$ is coherent we have $P (s \land r | T_{n}) + P (s \land \neg r | T_{n}) = P (s | T_{n}) .$ In particular, this means that $P (s \land r | T_{n}) \leq \frac{1}{2} P (s | T_{n})$ .

Observe that $P (s \land (s \to r) | T_{n}) = P (s | T_{n + 1}) P (s \to r | T_{n})$ , and $P (\neg s \land (s \to r) | T_{n}) = P (\neg s | T_{n + 1}) P (s \to r | T_{n})$ . Therefore, $P (s \land r | T_{n}) / P (\neg s | T_{n}) = P (s | T_{n + 1}) / P (\neg s | T_{n + 1})$ , so $\frac{1}{2} P (s | T_{n}) / P (\neg s | T_{n}) \geq P (s | T_{n + 1}) / P (\neg s | T_{n + 1})$ .

In the language of log odds, this means that $ℓ (P (s | T_{n})) - 1 \geq ℓ (P (s | T_{n + 1}))$ .

Let $T_{\infty}$ be the union of all the $T_{n}$ . Note that by the third condition of coherence, $ℓ (P (\neg s | T_{\infty})) \geq ℓ (P (\neg s | T_{n}))$ for all $n$ , so $ℓ (P (s | T_{\infty})) \leq ℓ (P (s | T_{n}))$ for all $n$

Consider $ℓ (P (s | T_{0}))$ and $ℓ (P (s | T_{\infty}))$ . These numbers cannot both be finite, since $ℓ (P (s | T_{\infty})) \leq ℓ (P (s | T_{n})) \leq ℓ (P (s | T_{0})) - n$ . Therefore, at least one of $P (s | T_{0})$ and $P (s | T_{\infty})$ must be 0 or 1. However neither $T_{0}$ nor $T_{\infty}$ prove or disprove $s$ , so this means that $P$ assigns conditional probability 1 to some statement that cannot be proven.

Open Problem: Does this theorem still hold if we leave condition 3 out of the definition of coherence?

What links here?

Scott Garrabrant2 Sep 2015 3:25 UTC

LW: 4 AF: 3

6 comments1 min readLW link

Scott Garrabrant 4 Jun 2015 15:55 UTC
LW: 3 AF: 2
AF
Charlie: Your proposal to remove condition 3 works when $T$ is a finite theory, but you cannot do that when $T$ is infinite. Indeed, we use 3 for the infinite extension form $T_{n}$ to $T_{\infty}$ . I suspect that you cannot remove condition 3.
- Charlie Steiner 4 Jun 2015 19:28 UTC
  0 points
  Parent
  Is the idea that the proof necessary to use 1 is of infinite length, and you want your logic to be finitary? Hm. This seems odd, because $P (s | T_{\infty})$ is in some sense already a function with an infinitely long argument. How do you feel about using 2 in the form of $P (T_{\infty} | T_{0}) = P (T_{n} | T_{0}) \cdot [s o m e p r o b a b i l i t y]$ , therefore $P (T_{\infty} | T_{0}) \leq P (T_{n} | T_{0})$ , which has the same amount of argument as $P (s | T_{\infty})$ ? I’m confused about at least one thing.
  
  Also, is there some reason you prefer not to reply using the button below the comment?
Charlie Steiner 4 Jun 2015 6:22 UTC
LW: 3 AF: 2
AF
It’s interesting that this is basically the opposite of the Gaifman condition—clearly there are conflicting intuitions about what makes a ‘good’ conditional logical probability.

On the open problem; In order to prove 3 from 2, all you need is that $P (s \land T | R) = P (s | R)$ when $s$ proves $T$ − 3 follows from 2 if you do that substitution, and then divide by $P (T | R)$ , which is less than or equal 1 (this may assume an extra commonsense axiom that probabilities are positive).

Now consider applying rule 1 to $P (s \land T | R)$ , T proven by s. R proves that only one of $s \land T, \neg s \land T, \neg s \land \neg T$ is true, and also proves that only one of $s, \neg s \land T, \neg s \land \neg T$ is true. Thus 3 is derivable from 1 and 2.
János Kramár 2 Jun 2015 0:36 UTC
LW: 2 AF: 2
AF
This is interesting! I would dispute, though, that a good logical conditional probability must be able to condition on arbitrary, likely-non-r.e. sets of sentences.
- Benya_Fallenstein 3 Jun 2015 18:29 UTC
  LW: 1 AF: 1
  AF Parent
  Hm; we could add an uninterpreted predicate symbol $Q (n)$ to the language of arithmetic, and let $s \equiv Q (0)$ and $r_{n} \equiv Q (¯ ¯¯¯¯¯¯¯¯¯¯¯ ¯ n + 1)$ . Then, it seems like the only barrier to recursive enumerability of $T_{\infty}$ is that $P$ ’s opinions about $Q (\cdot)$ aren’t computable; this seems worrying in practice, since it seems certain that we would like logical uncertainty to be able to reason about the values of computations that use more resources than we use to compute our own probability estimates. But on the other hand, all of this makes this sound like an issue of self-reference, which is its own can of worms (once we have a computable process assigning probabilities to the value of computations, we can consider the sentence saying “I’m assigned probability $< \frac{1}{2}$ ” etc.).
orthonormal 2 Jun 2015 20:27 UTC
0 points
AF
Nice! Basically, it looks like you construct a theory by assembling an infinite quantity of what the prior takes as evidence about $s$ , so that either the prior or the posterior has to take the most extreme odds on $s$ . It’s pretty intuitive in that light, and so I’m not dismayed that the “0 and 1 are not probabilities” property can’t hold when conditioned on arbitrary theories.

Important typo: $P$ assigns conditional probability 1 to some statement that cannot be proven.