At the recent Tel Aviv meetup, after a discussion of the open problems in the field of FAI, we reached the conclusion that the problem of logical uncertainty is one of the most major of the problems open today. In this post I will try to give a few insights I had on this problem, which can be thought of as the problem of constructing a (non-degenerate) probability measure over the set of the statements of an arbitrary logical system.
To clarify my goal: I’m trying to make a Solomonoff-like system for assigning probabilities to logical statements. For reasons much like the reasons that Solomonoff Induction is uncomputable, this system will be uncomputable as well. This puts certain limits to it’s usefulness, but it’s certainly a start. Solomonoff Induction is very useful, if only as a limiting case of computable processes. I believe the system I present here has some of that same value when it comes to the problem of logical uncertainty.
The obvious tack to take is one involving proof-lengths: It is reasonable to say that a sentence that is harder to prove should have a lower measure.
Let’s start with a definition: For each provably true statement φ , let ProofLength(φ) be the length of φ′s shortest proof. For all unprovable yet true statements, let ProofLength(φ) be ∞.
Therefore, if we have some probability measure P over true statements in our system, we want P to be monotonic decreasing in regards to proof length. (e.g. P(φ1)<P(φ2) ⇔ ProofLength(φ1)>ProofLength(φ2))
For obvious reasons, we want to assign probability 1 to our axioms. As an axiom always has a proof of length one (the mere statement of the axiom itself, without any previous statements that it is derived from), we want the property P(φ) = 1 ⇔ ProofLength(φ) = 1.
Lastly, for statements that are unprovable, we have to assign a probability of ½. Why? Let φ be an unprovable statement. Because P is a probability measure, P(φ)+P(~φ) = 1. Exactly one of φ or ~φ is true, but as they are unprovable, we have no way of knowing, even in theory, which is which. Thus, by symmetry, P(φ)=P(~φ)=½.
Given these desiderata, we will see what measure P we can construct that follows them.
For each true statement φ I will define P(φ) to be 2-ProofLength(φ)+2-1. This seems at first rather arbitrary, but it matches the desiderata in a fairly elegant way. This P is monotonic with regards to the proof length, as we demanded, and it correctly assigns a probability of 1 to our axioms. It also correctly assigns probability ½ to unprovable statements.
For this to be a probability measure over the set of all statements, we still need to define it on the false statements as well. This is trivial, as we can define P(φ)=1-P(~φ) for all false statements φ. This gives us all the properties we might demand of a logical probability measure that is based on proof lengths:
Statements that can be easily proved true are given more measure than those that are hard (or even impossible) to prove to be true.
And in reverse for false statements: Statements that can be easily proved to be false are given a lower measure than those that are hard (or even impossible) to prove to be false.
Specifically, it assigns probability 1 to axioms (and 0 to the negations of axioms.)
I have no idea if this is the best way to construct a logical probability measure, but it seems like a start. This seems like as decent a way as any to assign priors to statements of logical probability.
That handles priors, but it doesn’t seem to give an easy way to update on evidence. Obviously, once you prove something is true or false, you want its probability to rise to 1 or drop to 0 accordingly. Also, if you take some steps in the direction of proving something to be true or false, you want its probability to rise or fall accordingly.
To take a logical sentence and just blindly assign it the probability described above, ignoring everything else, is just as bad as taking the Solomonoff prior for the probability of a regular statement (in the standard system of probability) , and refusing to update away from that that. The role of the P described above is very much like that of the role of the Solomonoff prior in normal inductive Bayesian reasoning. This is nice, and perhaps it is a step forwards, but it isn’t a system that can be used by itself for making decisions.
Luckily, there is a way to update on information in Solomonoff induction. You merely “slice off” the worlds that are now impossible given the new evidence, and recalculate. (It can be proven that doing so is equivalent to updating using Bayes’ Theorem.)
To my delight, I found that something similar is possible with this system too! This is the truly important insight here, as this gives (for the first time, so far as I know) a method for actually updating on logical probabilities, so that as you advance towards a proof, your probability estimate of that sentence being true approaches 1, but only reaches 1 once you have a full proof.
What you do is exactly the same as in Solomonoff induction: Every time you prove something, you update by recalculating the probability of every statement, given that you now now the newly proven thing. Informally, the reasoning is like this: If you proved a statement, that means you know it with probability 1, or in other words, it can be considered as effectively a new axiom. So add it to your axioms, and you will get your updated probability!
In more technical terms, in a given logical system S, P(φ|ψ) will be defined as the P(φ) I described above, just in the logical system S+ψ, rather than is S. This obeys all the properties we want out of an update on evidence: An update increases the probability we assign to a statement that we proved part of, or proved a lemma for, or whatever, and decreases the probability we assign to a statement that we proved part of its negation, or a lemma for the proof of its negation, or the like.
This is not a complete theory of logical uncertainty, but it could be a foundation. It certainly includes some insights I haven’t seen before, or at least that I haven’t seen explained in these terms. In the upcoming weeks the Tel Aviv meetup group is planning to do a MIRIx workshop on the topic of logical uncertainty, and we hope to make some real strides in it. Perhaps we will expand on this, or perhaps we will come up with some other avenue of attack. If we can give logical uncertainty a formal grounding, that will be a fairly major step. After all, the black box of logical uncertainty sits right at the heart of most attempts to advance AGI, and at the moment it is merely handwaved away in most applications. But eventually it needs an underpinning, and that is what we are aiming at.
An Attempt at Logical Uncertainty
At the recent Tel Aviv meetup, after a discussion of the open problems in the field of FAI, we reached the conclusion that the problem of logical uncertainty is one of the most major of the problems open today. In this post I will try to give a few insights I had on this problem, which can be thought of as the problem of constructing a (non-degenerate) probability measure over the set of the statements of an arbitrary logical system.
To clarify my goal: I’m trying to make a Solomonoff-like system for assigning probabilities to logical statements. For reasons much like the reasons that Solomonoff Induction is uncomputable, this system will be uncomputable as well. This puts certain limits to it’s usefulness, but it’s certainly a start. Solomonoff Induction is very useful, if only as a limiting case of computable processes. I believe the system I present here has some of that same value when it comes to the problem of logical uncertainty.
The obvious tack to take is one involving proof-lengths: It is reasonable to say that a sentence that is harder to prove should have a lower measure.
Let’s start with a definition:
For each provably true statement φ , let ProofLength(φ) be the length of φ′s shortest proof. For all unprovable yet true statements, let ProofLength(φ) be ∞.
Therefore, if we have some probability measure P over true statements in our system, we want P to be monotonic decreasing in regards to proof length. (e.g. P(φ1)<P(φ2) ⇔ ProofLength(φ1)>ProofLength(φ2))
For obvious reasons, we want to assign probability 1 to our axioms. As an axiom always has a proof of length one (the mere statement of the axiom itself, without any previous statements that it is derived from), we want the property P(φ) = 1 ⇔ ProofLength(φ) = 1.
Lastly, for statements that are unprovable, we have to assign a probability of ½. Why? Let φ be an unprovable statement. Because P is a probability measure, P(φ)+P(~φ) = 1. Exactly one of φ or ~φ is true, but as they are unprovable, we have no way of knowing, even in theory, which is which. Thus, by symmetry, P(φ)=P(~φ)=½.
Given these desiderata, we will see what measure P we can construct that follows them.
For each true statement φ I will define P(φ) to be 2-ProofLength(φ)+2-1. This seems at first rather arbitrary, but it matches the desiderata in a fairly elegant way. This P is monotonic with regards to the proof length, as we demanded, and it correctly assigns a probability of 1 to our axioms. It also correctly assigns probability ½ to unprovable statements.
For this to be a probability measure over the set of all statements, we still need to define it on the false statements as well. This is trivial, as we can define P(φ)=1-P(~φ) for all false statements φ. This gives us all the properties we might demand of a logical probability measure that is based on proof lengths:
Statements that can be easily proved true are given more measure than those that are hard (or even impossible) to prove to be true.
And in reverse for false statements: Statements that can be easily proved to be false are given a lower measure than those that are hard (or even impossible) to prove to be false.
Specifically, it assigns probability 1 to axioms (and 0 to the negations of axioms.)
I have no idea if this is the best way to construct a logical probability measure, but it seems like a start. This seems like as decent a way as any to assign priors to statements of logical probability.
That handles priors, but it doesn’t seem to give an easy way to update on evidence. Obviously, once you prove something is true or false, you want its probability to rise to 1 or drop to 0 accordingly. Also, if you take some steps in the direction of proving something to be true or false, you want its probability to rise or fall accordingly.
To take a logical sentence and just blindly assign it the probability described above, ignoring everything else, is just as bad as taking the Solomonoff prior for the probability of a regular statement (in the standard system of probability) , and refusing to update away from that that. The role of the P described above is very much like that of the role of the Solomonoff prior in normal inductive Bayesian reasoning. This is nice, and perhaps it is a step forwards, but it isn’t a system that can be used by itself for making decisions.
Luckily, there is a way to update on information in Solomonoff induction. You merely “slice off” the worlds that are now impossible given the new evidence, and recalculate. (It can be proven that doing so is equivalent to updating using Bayes’ Theorem.)
To my delight, I found that something similar is possible with this system too! This is the truly important insight here, as this gives (for the first time, so far as I know) a method for actually updating on logical probabilities, so that as you advance towards a proof, your probability estimate of that sentence being true approaches 1, but only reaches 1 once you have a full proof.
What you do is exactly the same as in Solomonoff induction: Every time you prove something, you update by recalculating the probability of every statement, given that you now now the newly proven thing. Informally, the reasoning is like this: If you proved a statement, that means you know it with probability 1, or in other words, it can be considered as effectively a new axiom. So add it to your axioms, and you will get your updated probability!
In more technical terms, in a given logical system S, P(φ|ψ) will be defined as the P(φ) I described above, just in the logical system S+ψ, rather than is S. This obeys all the properties we want out of an update on evidence: An update increases the probability we assign to a statement that we proved part of, or proved a lemma for, or whatever, and decreases the probability we assign to a statement that we proved part of its negation, or a lemma for the proof of its negation, or the like.
This is not a complete theory of logical uncertainty, but it could be a foundation. It certainly includes some insights I haven’t seen before, or at least that I haven’t seen explained in these terms. In the upcoming weeks the Tel Aviv meetup group is planning to do a MIRIx workshop on the topic of logical uncertainty, and we hope to make some real strides in it. Perhaps we will expand on this, or perhaps we will come up with some other avenue of attack. If we can give logical uncertainty a formal grounding, that will be a fairly major step. After all, the black box of logical uncertainty sits right at the heart of most attempts to advance AGI, and at the moment it is merely handwaved away in most applications. But eventually it needs an underpinning, and that is what we are aiming at.