An Attempt at Logical Uncertainty
At the recent Tel Aviv meetup, after a discussion of the open problems in the field of FAI, we reached the conclusion that the problem of logical uncertainty is one of the most major of the problems open today. In this post I will try to give a few insights I had on this problem, which can be thought of as the problem of constructing a (non-degenerate) probability measure over the set of the statements of an arbitrary logical system.
To clarify my goal: I’m trying to make a Solomonoff-like system for assigning probabilities to logical statements. For reasons much like the reasons that Solomonoff Induction is uncomputable, this system will be uncomputable as well. This puts certain limits to it’s usefulness, but it’s certainly a start. Solomonoff Induction is very useful, if only as a limiting case of computable processes. I believe the system I present here has some of that same value when it comes to the problem of logical uncertainty.
The obvious tack to take is one involving proof-lengths: It is reasonable to say that a sentence that is harder to prove should have a lower measure.
Let’s start with a definition:
For each provably true statement φ , let ProofLength(φ) be the length of φ′s shortest proof. For all unprovable yet true statements, let ProofLength(φ) be ∞.
Therefore, if we have some probability measure P over true statements in our system, we want P to be monotonic decreasing in regards to proof length. (e.g. P(φ1)<P(φ2) ⇔ ProofLength(φ1)>ProofLength(φ2))
For obvious reasons, we want to assign probability 1 to our axioms. As an axiom always has a proof of length one (the mere statement of the axiom itself, without any previous statements that it is derived from), we want the property P(φ) = 1 ⇔ ProofLength(φ) = 1.
Lastly, for statements that are unprovable, we have to assign a probability of ½. Why? Let φ be an unprovable statement. Because P is a probability measure, P(φ)+P(~φ) = 1. Exactly one of φ or ~φ is true, but as they are unprovable, we have no way of knowing, even in theory, which is which. Thus, by symmetry, P(φ)=P(~φ)=½.
Given these desiderata, we will see what measure P we can construct that follows them.
For each true statement φ I will define P(φ) to be 2-ProofLength(φ)+2-1. This seems at first rather arbitrary, but it matches the desiderata in a fairly elegant way. This P is monotonic with regards to the proof length, as we demanded, and it correctly assigns a probability of 1 to our axioms. It also correctly assigns probability ½ to unprovable statements.
For this to be a probability measure over the set of all statements, we still need to define it on the false statements as well. This is trivial, as we can define P(φ)=1-P(~φ) for all false statements φ. This gives us all the properties we might demand of a logical probability measure that is based on proof lengths:
Statements that can be easily proved true are given more measure than those that are hard (or even impossible) to prove to be true.
And in reverse for false statements: Statements that can be easily proved to be false are given a lower measure than those that are hard (or even impossible) to prove to be false.
Specifically, it assigns probability 1 to axioms (and 0 to the negations of axioms.)
I have no idea if this is the best way to construct a logical probability measure, but it seems like a start. This seems like as decent a way as any to assign priors to statements of logical probability.
That handles priors, but it doesn’t seem to give an easy way to update on evidence. Obviously, once you prove something is true or false, you want its probability to rise to 1 or drop to 0 accordingly. Also, if you take some steps in the direction of proving something to be true or false, you want its probability to rise or fall accordingly.
To take a logical sentence and just blindly assign it the probability described above, ignoring everything else, is just as bad as taking the Solomonoff prior for the probability of a regular statement (in the standard system of probability) , and refusing to update away from that that. The role of the P described above is very much like that of the role of the Solomonoff prior in normal inductive Bayesian reasoning. This is nice, and perhaps it is a step forwards, but it isn’t a system that can be used by itself for making decisions.
Luckily, there is a way to update on information in Solomonoff induction. You merely “slice off” the worlds that are now impossible given the new evidence, and recalculate. (It can be proven that doing so is equivalent to updating using Bayes’ Theorem.)
To my delight, I found that something similar is possible with this system too! This is the truly important insight here, as this gives (for the first time, so far as I know) a method for actually updating on logical probabilities, so that as you advance towards a proof, your probability estimate of that sentence being true approaches 1, but only reaches 1 once you have a full proof.
What you do is exactly the same as in Solomonoff induction: Every time you prove something, you update by recalculating the probability of every statement, given that you now now the newly proven thing. Informally, the reasoning is like this: If you proved a statement, that means you know it with probability 1, or in other words, it can be considered as effectively a new axiom. So add it to your axioms, and you will get your updated probability!
In more technical terms, in a given logical system S, P(φ|ψ) will be defined as the P(φ) I described above, just in the logical system S+ψ, rather than is S. This obeys all the properties we want out of an update on evidence: An update increases the probability we assign to a statement that we proved part of, or proved a lemma for, or whatever, and decreases the probability we assign to a statement that we proved part of its negation, or a lemma for the proof of its negation, or the like.
This is not a complete theory of logical uncertainty, but it could be a foundation. It certainly includes some insights I haven’t seen before, or at least that I haven’t seen explained in these terms. In the upcoming weeks the Tel Aviv meetup group is planning to do a MIRIx workshop on the topic of logical uncertainty, and we hope to make some real strides in it. Perhaps we will expand on this, or perhaps we will come up with some other avenue of attack. If we can give logical uncertainty a formal grounding, that will be a fairly major step. After all, the black box of logical uncertainty sits right at the heart of most attempts to advance AGI, and at the moment it is merely handwaved away in most applications. But eventually it needs an underpinning, and that is what we are aiming at.
- 30 Jun 2014 6:33 UTC; 15 points) 's comment on Welcome to Less Wrong! (6th thread, July 2013) by (
- 22 Aug 2014 0:43 UTC; 1 point) 's comment on Thought experiments on simplicity in logical probability by (
Hi Ben,
If you have not already seen them, you should probably read this and this, which are two ways to define a coherent probability measure on sentences. The first link my construction, and the second link is Abram Demski’s. Abram and I are working on this problem in the MIRIxLosAngeles group.
You seem to be doing something different from what we are doing, in that our probability functions assign probability 1 to all provable statements, and try to answer the question of what to do with statements which are neither provable nor disprovable. (If you want your probability function to represent a probability measure on models of your logical system, then you cant just give them all probability 1⁄2)
For both of our constructions, the probability assignments are approximable in the sense that we have a computation which updates probabilities and converges to the probability in our construction (but never equals it exactly) and both of our computations work by updating as you observe proofs of sentences.
In particular I have a procedure which (I only conjecture right now) converges to my probability assignment by noticing over and over again “I can prove that exactly one of A, B, and C is true, but their probabilities do not sum to one” and then shifting the probabilities of these sentences accordingly. (This is not written up in the post I linked to, but I can explain it more if you like.)
If you do look at them, feel free to ask me questions about either construction, or if you would like I can talk to you about what we’ve done in a skype call or something. I have a few conjectures that I would like to prove related to my construction.
Also, neither one of us pays any attention to proof length, so it would seem reasonable for you to ignore what we did and go your own direction. However, if you find yourself thinking mostly about how to assign probabilities to unprovable things, then it is worth looking at.
I do have one suggestion on notation. I suggest you reserve the term “probability measure” for something more strict than just assignments of probabilities to sentences. Say “probability function” or “probability assignment.” I would say a “probability measure on sentences” should represent a way to choose a single sentence at random, and a “probability measure on models” should be a way to choose a single model of your axioms at random.
I am excited to see what you guys do!
Thanks for those links! They both look very intresting, and I’ll read them in depth.
As you mention, you are doing something slightly diffrent. You are assigning probability 1 to all the provable sentances, and then trying to investagate the unprovable ones. I, on the other hand, am taking the unprovable ones as just that, unprovable, and focusing on assigning probability mass to the provable ones.
I think the question of how to assign probability mass to provable, yet not yet proven, statments is the really important part of logical uncertanty. That’s the part that is handwaved away in discussions of, say, UDT, and so is the part that I want to focus on.
About your suggestion on notation: Yes, I was being slightly casul with notation there. By construction, it is a measure, I think, as always gives probabilities in the range [0,1], and it obeys the law of the excluded middle. I didn’t actually prove that the measure of multiple independant sentances is equal to the sum of the measures, but I think it follows… More work is needed on this. At the moment, this only gives probabilities to individual sentances, and not to gtoups of sentances, so technically that wouldn’t work at all. The obvois next step is tpo try to extend it in order to be able to do this. But until that is done, you are correct that it is abuse of notation to call it a measure.
I do not think it is a measure. If A B and C are all unprovable, undisprovable, but provably disjoint sentences, then your system cannot assign probability of A or B or C equal to P(A)+P(B)+P(C) because that must be 3⁄2.
I think that the thing that makes logical uncertainty hard is the fact that you cant just talk about probability measures (on models) because by definition a probability measure on models must assign probability 1 to all provable sentences.
That’s a good point, and I concede that you are right. At the moment, it’s more of a “probability assignment”, as you said, rather than a probability measure. More work needs to be done on the subject, and hopefully we will progress along these lines at the MIRIx workshop.
You can’t do that. Let φ and ψ be sentences such that φ, ψ, and (φ ^ ψ) are all neither provable nor disprovable. If all of these sentences are given probability 1⁄2, then since φ and (φ ^ ψ) have the same probability, (φ ^ ~ψ) must have probability 0. That is, φ implies ψ with probability 1. By symmetry, ψ implies φ with probability 1, so φ and ψ are equivalent with probability 1. But there exist such pairs of sentences that are not equivalent.
Edit: Actually it looks like my argument doesn’t apply in your system because it does not satisfy the axioms of a probability measure. For instance, if φ has a short proof that does not meantion ψ anywhere, and ψ does not have a short proof, then (φ v ψ) will be provable, with its shortest proof being the proof of φ with the rule of inference φ |- (φ v ψ) appended to the end, a longer proof, and thus P(φ v ψ) < P(φ), which is ridiculous. Another reason I don’t like the idea of assigning probabilities based on proof length is that in order to compute the probability, you have to find a proof, and by that time, you may as well give probability 1 to the statement. The only reason I would want to assign a probability other than 1 to a provable statement is if I didn’t already know that it was provable.
To add to the hail of links, you might want to inspect the big official MIRI progress report on the problem here.
Also, though i know quite a bit less about this topic than the other people here (correct me if I’m wrong somebody), I’m a little suspicious of this distribution because I don’t see any way to approximate the length of the shortest proof. Given an unproven mathematical statement for which you aren’t sure whether it is true or false, how could you establish even a rough estimate of how hard it is to prove in the absence of actually trying to prove it?
Hi Bejnamin! Have some links you might already have read.
For an introduction to the topic, you can’t go wrong with the scholarly literature: Gaifman’s “Reasoning with limited resources and assigning probabilities to arithmetical statements.”
I will also happily recommend my own posts on the subject, beginning at the very beginning, here.
For breadth, you might also check out Abram Demski’s description, which is good to think about though unsatisfying to me. There’s also some discussion on lesswrong and a MIRI document uses a slightly modified version, somewhere.
Anyhow, I won’t cotton to any method of assigning a logical probability that takes longer than just brute-forcing the right answer. For this particular problem I think a bottom-up approach is what you want to use.
I appreciate the links. I haven’t read Gaifman’s paper before, so I’ll go ahead and read that.
I see the sentiment there, and that too is a valid approach. That said, after trying to use the bottom-up approach many times and failing, and after seeing others fail using bottom-up approaches, I think that if we can at least build a nonconstructive top-down theory, that would be a starting point. After all, Solomonoff Induction is completely top down, yet it’s a very powerful theoretical tool.
One nonconstructive (and wildly uncomputable) approach to the problem is this one: http://www.hutter1.net/publ/problogics.pdf
<.<
>.>
Well, care to explain what I did wrong?
Seconding Coscott’s and Manfred’s comments. I also had a post on this topic, which seems to solve one use case of logical uncertainty, namely, choosing between several logical counterfactuals that are all provably true.
In addition to the links provided by others, I also recommend looking at Paul’s new report: “Non-omniscience, probabilistic inference, and metamathematics.”
Edit: Oops, this was already linked from this comment.