Logic in the language of probability
This post is a minor note, to go along with the post on the probabilistic Löb theorem. It simply seeks to justify why terms like “having probability 1” are used interchangeably with “provable” and why implications symbols “→” can be used in a probabilistic setting.
Take a system of classical logic, with a single rule of inference: modus ponens:
From A and A→B, deduce B.
Having a single rule of inference isn’t much of a restriction, because you can replace other rules of inference (“from A1,A2,… and An, deduce B”) with an axiom or axiom schema (“A1∧A2∧...∧An → B”) and then use modus ponens on that axiom to get the other rule of inference.
In this logical system, I’m now going to make some purely syntactical changes—not changing the meaning of anything, just the way we write things. For any sentence A that doesn’t contain an implication arrow →, replace
A with P(A)=1.
Similarly, replace any sentence of the type
A → B with P(B|A)=1.
This is recursive, so we replace
(A → B) → C with P(C | P(B|A)=1 )=1.
And instead of using modus ponens, we’ll use a combined Bayesian inference and law of total probability:
From P(A)=1 and P(B|A)=1, deduce P(B)=1.
Again, these are purely syntactic changes—just rewriting the symbols of a formal system in an entirely equivalent fashion. What this demonstrates, though, is that we can do classical logic using the language and the tools of probability—and hence that the objects and laws of classical logic have probabilistic counterparts.
In fact, we can do a bit better, make the similarity more pronounced. Instead of having modus ponens, allow for two “logical” rules of inference:
From A → B and A, deduce A∧B.
From A∧B, deduce B (simplification).
Together, these two rules give modus ponens, and if we’ve chosen our axioms in non-weird ways, we shouldn’t be able to prove anything we couldn’t before. Then when writing our logic in the language of probability, we similarly have two “probabilistic” rules of inference:
From P(B|A)=s and P(A)=t, deduce P(A∧B)=st (conditional probability).
From P(A∧B)=s, deduce P(B)≥s.
Here, we’ve sneakily introduced the possibility of probabilities not equal to 1 - even though they will never appear in our syntax, the rules allow for them. In order to show complete equivalence between the two systems, we merely have to require that P(B)≥1 be just another way to write P(B)=1.
We can extend the correspondence still further by allowing P(A)=0 to be another way of writing P(¬A)=1 (and stating that P(B)≥0 is an always true but vacuous statement). If we do so, we’d need to add more “logical” rules of inference to maintain the strict equivalence with the “probabilistic” rules of inference above; but again, if our axioms are non-weird, we wouldn’t actually be able to prove anything we couldn’t before.
I won’t belabour the point any further, just reiterate that classical logic can be done in the language of probability, so I’ll be somewhat sloppy about language when comparing these two worlds.
Problem: If A is false, then A → B is true, but P(B|A) is undefined.
Then define by fiat that P(B|A)=1 whenever P(A)=0.
But then if P(A)=0, P(B|A) + P(¬B|A) = 2, which seems undesirable.
Since P(A|B) is properly undefined, any attempt to define it will have some negative features.
I have a couple of questions.
Sure this is right? After all, the implication is also true in the case of A being false, the conjuntion certainly is not.
Intuitively I suggest there should be an inequality, too, seeing as B|A is not necessarily independent of A.
He specifically specifies that A is true as well as A ⇒ B
B|A is not an event, so it makes no sense to talk about whether or not it is independent of A.
To see why this is a valid theorem, break it up into three posibilities, P(A & B) = x, P(A & ~B) = y, P(~A) = 1 - x—y.
Then P(A) = P(A & B) + P(A & ~B) = x + y
For P(B|A), restrict to the space where A is true, this has size x + y of which B takes up x, so P(B|A) = x / (x+y)
Thus P(A)P(B|A) = x = P(A & B)
Thank you. That is what I deserve for cursory reading.