Briefly explain the Loebian obstacle and it’s relevance to AI (feel free to skip it if you know what the Loebian obstacle is).
Suggest a solution in the form a formal system which assigns probabilities (more generally probability intervals) to mathematical sentences (and which admits a form of “Loebian” self-referential reasoning). The method is well-defined both for consistent and inconsistent axiomatic systems, the later being important in analysis of logical counterfactuals like in UDT.
Background
Logic
When can we consider a mathematical theorem to be established? The obvious answer is: when we proved it. Wait, proved it in what theory? Well, that’s debatable. ZFC is popular choice for mathematicians, but how do we know it is consistent (let alone sound, i.e. that it only proves true sentences)? All those spooky infinite sets, how do you know it doesn’t break somewhere along the line? There’s lots of empirical evidence, but we can’t prove it, and it’s proofs we’re interesting in, not mere evidence, right?
Peano arithmetic seems like a safer choice. After all, if the natural numbers don’t make sense, what does? Let’s go with that. Suppose we have a sentence s in the language of PA. If someone presents us with a proof p in PA, we believe s is true. Now consider the following situations: instead of giving you a proof of s, someone gave you a PA-proof p1 that p exists.After all, PA admits defining “PA-proof” in PA language. Common sense tells us that p1is a sufficient argument to believe s. Maybe, we can prove it within PA? That is, if we have a proof of “if a proof of s exists then s” and a proof of R(s)=”a proof of s exists” then we just proved s. That’s just modus ponens.
There are two problems with that.
First, there’s no way to prove the sentence L:=”for alls if R(s) then s”, since it’s not a PA-sentence at all. The problem is that “for all s” references s as a natural number encoding a sentence. On the other hand, “then s” references s as the truth-value of the sentence. Maybe we can construct a PA-formula T(s) which means “the sentence encoded by the number s is true”? Nope, that would get us in trouble with the liar paradox (it would be possible to construct a sentence saying “this sentence is false”).
Second, Loeb’s theorem says that if we can prove L(s):=”if R(s) exists then s” for a given s, then we can prove s. This is a problem since it means there can be no way to prove L(s) for all s in any sense, since it’s unprovable for s which are unprovable. In other words, if you proved not-s, there is no way to conclude that “no proof of s exists”.
What if we add an inference rule Q to our logic allowing to go from R(s) to s? Let’s call the new formal system PA1. p1 appended by a Q-step becomes an honest proof of s in PA1. Problem solved? Not really! Now someone can give you a proof of R1(s):=”a PA1-proof of s exists”. Back to square one! Wait a second, what if we add a new rule Q1 allowing to go from R1(s) to s? OK, but now we got R2(s):=”a PA2-proof of s exists”. Hmm, what if add an infinite number of rules Qk? Fine, but now we got Rω(s):=”a PAω-proof of s exists”. And so on, and so forth, the recursive ordinals are a plenty...
Bottom line, Loeb’s theorem works for any theory containing PA, so we’re stuck.
AI
Suppose you’re trying to build a self-modifying AGI called “Lucy”. Lucy works by considering possible actions and looking for formal proofs that taking one of them will increase expected utility. In particular, it has self-modifying actions in its strategy space. A self-modifying action creates essentially a new agent: Lucy2. How can Lucy decide that becoming Lucy2 is a good idea? Well, a good step in this direction would be proving that Lucy2 would only take actions that are “good”. I.e., we would like Lucy to reason as follows “Lucy2 uses the same formal system as I, so if she decides to take action a, it’s because she has a proof p of the sentence s(a) that ‘a increases expected utility’. Since such a proof exits, a does increase expected utility, which is good news!” Problem: Lucy is using L in there, applied to her own formal system! That cannot work! So, Lucy would have a hard time self-modifying in a way which doesn’t make its formal system weaker.
As another example where this poses a problem, suppose Lucy observes another agent called “Kurt”. Lucy knows, by analyzing her sensory evidence, that Kurt proves theorems using the same formal system as Lucy. Suppose Lucy found out that Kurt proved theorem s, but she doesn’t know how. We would like Lucy to be able to conclude s is, in fact, true (at least with the probability that her model of physical reality is correct). Alas, she cannot.
Here, cousin_it explains a method to assign probabilities to sentences in an inconsistent theory T. It works as follows. Consider sentence s. Since T is inconsistent, there are T-proofs both of s and of not-s. Well, in a courtroom both sides are allowed to have arguments, why not try the same approach here? Let’s weight the proofs as a function of their length, analogically to weighting hypotheses in Solomonoff induction. That is, suppose we have a prefix-free encoding of proofs as bit sequences. Then, it makes sense to consider a random bit sequence and ask whether it is a proof of something. Define the probability of s to be
P(s) := (probability of a random sequence to be a proof of s) / (probability of a random sequence to be a proof of s or not-s)
Nice, but it doesn’t solve the Loebian obstacle yet.
I will now formulate an extension of this idea that allows assigning an interval of probabilities [Pmin(s), Pmax(s)] to any sentence s. This interval is a sort of “Knightian uncertainty”. I have some speculations how to extract a single number from this interval in the general case, but even without that, I believe that Pmin(s) = Pmax(s) in many interesting cases.
First, the general setting:
With every sentence s, there are certain texts v which are considered to be “evidence relevant to s”. These are divided into “negative” and “positive” evidence. We define sgn(v) := +1 for positive evidence, sgn(v) := −1 for negative evidence.
Each piece of evidence v is associated with the strength of the evidence strs(v) which is a number in [0, 1]
Each piece of evidence v is associated with an “energy” function es,v : [0, 1] → [0, 1]. It is a continuous convex function.
The “total energy” associated with s is defined to b es := ∑v2-l(v) es,v where l(v) is the length of v.
Since es,vare continuous convex, so is es. Hence it attains its minimum on a closed interval which is [Pmin(s), Pmax(s)] by definition.
Now, the details:
A piece of evidence v for s is defined to be one of the following:
a proof of s
sgn(v) := +1
strs(v) := 1
es,v(q) := (1 - q)2
a proof of not-s
sgn(v) := −1
strs(v) := 1
es,v(q) := q2
a piece of positive evidence for the sentence R-+(s, p) := “Pmin(s) >= p”
sgn(v) := +1
strs(v) := strR-+(s, p)(v) p
es,v(q) := 0 for q > p; strR-+(s, p)(v) (q—p)2 for q < p
a piece of negative evidence for the sentence R--(s, p) := “Pmin(s) < p”
sgn(v) := +1
strs(v) := strR--(s, p)(v) p
es,v(q) := 0 for q > p; strR--(s, p)(v) (q—p)2 for q < p
a piece of negative evidence for the sentence R++(s, p) := “Pmax(s) > p”
sgn(v) := −1
strs(v) := strR++(s, p)(v) (1 - p)
es,v(q) := 0 for q < p; strR-+(s, p)(v) (q—p)2 for q > p
a piece of positive evidence for the sentence R+-(s, p) := “Pmax(s) ⇐ p”
sgn(v) := −1
strs(v) := strR+-(s, p)(v) (1 - p)
es,v(q) := 0 for q < p; strR-+(s, p)(v) (q—p)2 for q > p
Technicality: I suggest that for our purposes, a “proof of s” is allowed to be a proof of sentence equivalent to s in 0-th order logic (e.g. not-not-s). This ensures that our probability intervals obey the properties we’d like them to obey wrt propositional calculus.
Now, consider again our self-modifying agent Lucy. Suppose she makes her decisions according to a system of evidence logic like above. She can now reason along the lines of “Lucy2 uses the same formal system as I. If she decides to take action a, it’s because she has strong evidence for the sentence s(a) that ‘a increases expected utility’. I just proved that there would be strong evidence for the expected utility increasing. Therefore, the expected utility would have a high value with high logical probability. But evidence for high logical probability of a sentence is evidence for the sentence itself. Therefore, I now have evidence that expected utility will increase!”
This analysis is very sketchy, but I think it lends hope that the system leads to the desired results.
Overcoming the Loebian obstacle using evidence logic
In this post I intend to:
Briefly explain the Loebian obstacle and it’s relevance to AI (feel free to skip it if you know what the Loebian obstacle is).
Suggest a solution in the form a formal system which assigns probabilities (more generally probability intervals) to mathematical sentences (and which admits a form of “Loebian” self-referential reasoning). The method is well-defined both for consistent and inconsistent axiomatic systems, the later being important in analysis of logical counterfactuals like in UDT.
Background
Logic
When can we consider a mathematical theorem to be established? The obvious answer is: when we proved it. Wait, proved it in what theory? Well, that’s debatable. ZFC is popular choice for mathematicians, but how do we know it is consistent (let alone sound, i.e. that it only proves true sentences)? All those spooky infinite sets, how do you know it doesn’t break somewhere along the line? There’s lots of empirical evidence, but we can’t prove it, and it’s proofs we’re interesting in, not mere evidence, right?
Peano arithmetic seems like a safer choice. After all, if the natural numbers don’t make sense, what does? Let’s go with that. Suppose we have a sentence s in the language of PA. If someone presents us with a proof p in PA, we believe s is true. Now consider the following situations: instead of giving you a proof of s, someone gave you a PA-proof p1 that p exists. After all, PA admits defining “PA-proof” in PA language. Common sense tells us that p1 is a sufficient argument to believe s. Maybe, we can prove it within PA? That is, if we have a proof of “if a proof of s exists then s” and a proof of R(s)=”a proof of s exists” then we just proved s. That’s just modus ponens.
There are two problems with that.
First, there’s no way to prove the sentence L:=”for all s if R(s) then s”, since it’s not a PA-sentence at all. The problem is that “for all s” references s as a natural number encoding a sentence. On the other hand, “then s” references s as the truth-value of the sentence. Maybe we can construct a PA-formula T(s) which means “the sentence encoded by the number s is true”? Nope, that would get us in trouble with the liar paradox (it would be possible to construct a sentence saying “this sentence is false”).
Second, Loeb’s theorem says that if we can prove L(s):=”if R(s) exists then s” for a given s, then we can prove s. This is a problem since it means there can be no way to prove L(s) for all s in any sense, since it’s unprovable for s which are unprovable. In other words, if you proved not-s, there is no way to conclude that “no proof of s exists”.
What if we add an inference rule Q to our logic allowing to go from R(s) to s? Let’s call the new formal system PA1. p1 appended by a Q-step becomes an honest proof of s in PA1. Problem solved? Not really! Now someone can give you a proof of
R1(s):=”a PA1-proof of s exists”. Back to square one! Wait a second, what if we add a new rule Q1 allowing to go from R1(s) to s? OK, but now we got R2(s):=”a PA2-proof of s exists”. Hmm, what if add an infinite number of rules Qk? Fine, but now we got Rω(s):=”a PAω-proof of s exists”. And so on, and so forth, the recursive ordinals are a plenty...
Bottom line, Loeb’s theorem works for any theory containing PA, so we’re stuck.
AI
Suppose you’re trying to build a self-modifying AGI called “Lucy”. Lucy works by considering possible actions and looking for formal proofs that taking one of them will increase expected utility. In particular, it has self-modifying actions in its strategy space. A self-modifying action creates essentially a new agent: Lucy2. How can Lucy decide that becoming Lucy2 is a good idea? Well, a good step in this direction would be proving that Lucy2 would only take actions that are “good”. I.e., we would like Lucy to reason as follows “Lucy2 uses the same formal system as I, so if she decides to take action a, it’s because she has a proof p of the sentence s(a) that ‘a increases expected utility’. Since such a proof exits, a does increase expected utility, which is good news!” Problem: Lucy is using L in there, applied to her own formal system! That cannot work! So, Lucy would have a hard time self-modifying in a way which doesn’t make its formal system weaker.
As another example where this poses a problem, suppose Lucy observes another agent called “Kurt”. Lucy knows, by analyzing her sensory evidence, that Kurt proves theorems using the same formal system as Lucy. Suppose Lucy found out that Kurt proved theorem s, but she doesn’t know how. We would like Lucy to be able to conclude s is, in fact, true (at least with the probability that her model of physical reality is correct). Alas, she cannot.
See MIRI’s paper for more discussion.
Evidence Logic
Here, cousin_it explains a method to assign probabilities to sentences in an inconsistent theory T. It works as follows. Consider sentence s. Since T is inconsistent, there are T-proofs both of s and of not-s. Well, in a courtroom both sides are allowed to have arguments, why not try the same approach here? Let’s weight the proofs as a function of their length, analogically to weighting hypotheses in Solomonoff induction. That is, suppose we have a prefix-free encoding of proofs as bit sequences. Then, it makes sense to consider a random bit sequence and ask whether it is a proof of something. Define the probability of s to be
P(s) := (probability of a random sequence to be a proof of s) / (probability of a random sequence to be a proof of s or not-s)
Nice, but it doesn’t solve the Loebian obstacle yet.
I will now formulate an extension of this idea that allows assigning an interval of probabilities [Pmin(s), Pmax(s)] to any sentence s. This interval is a sort of “Knightian uncertainty”. I have some speculations how to extract a single number from this interval in the general case, but even without that, I believe that Pmin(s) = Pmax(s) in many interesting cases.
First, the general setting:
With every sentence s, there are certain texts v which are considered to be “evidence relevant to s”. These are divided into “negative” and “positive” evidence. We define sgn(v) := +1 for positive evidence, sgn(v) := −1 for negative evidence.
Each piece of evidence v is associated with the strength of the evidence strs(v) which is a number in [0, 1]
Each piece of evidence v is associated with an “energy” function es,v : [0, 1] → [0, 1]. It is a continuous convex function.
The “total energy” associated with s is defined to b es := ∑v 2-l(v) es,v where l(v) is the length of v.
Since es,v are continuous convex, so is es. Hence it attains its minimum on a closed interval which is
[Pmin(s), Pmax(s)] by definition.
A piece of evidence v for s is defined to be one of the following:
a proof of s
sgn(v) := +1
strs(v) := 1
es,v(q) := (1 - q)2
a proof of not-s
sgn(v) := −1
strs(v) := 1
es,v(q) := q2
a piece of positive evidence for the sentence R-+(s, p) := “Pmin(s) >= p”
sgn(v) := +1
strs(v) := strR-+(s, p)(v) p
es,v(q) := 0 for q > p; strR-+(s, p)(v) (q—p)2 for q < p
a piece of negative evidence for the sentence R--(s, p) := “Pmin(s) < p”
sgn(v) := +1
strs(v) := strR--(s, p)(v) p
es,v(q) := 0 for q > p; strR--(s, p)(v) (q—p)2 for q < p
a piece of negative evidence for the sentence R++(s, p) := “Pmax(s) > p”
sgn(v) := −1
strs(v) := strR++(s, p)(v) (1 - p)
es,v(q) := 0 for q < p; strR-+(s, p)(v) (q—p)2 for q > p
a piece of positive evidence for the sentence R+-(s, p) := “Pmax(s) ⇐ p”
sgn(v) := −1
strs(v) := strR+-(s, p)(v) (1 - p)
es,v(q) := 0 for q < p; strR-+(s, p)(v) (q—p)2 for q > p