I don’t see the need for 2 parameters. The way I formulated it, there is only 1 parameter: the depth of analysis D. I always consider all sentences. This makes sense because all but a finite set of sentences cannot figure in a contradiction of length ⇐ D, so all but a finite set of sentences get probability 1⁄2 for any given D.
Regarding convergence of probabilities of undecidable statements as D → infinity, well, I don’t know how to prove it, but I also don’t know how to disprove it. I can try to assign a probability to it… ;) Is the result by Sawin you mentioned published somewhere?
Is it important to the analysis whether the probabilities converge as D tends to infinity? Do you rely on this at any point?
If you need to make sure the probabilities converge, then you could consider something like the following.
First, split the sentences in your system F into “positive” sentences (the ones which have no leading “not” symbol, ~, or else which have an even number of leading “not” symbols) and “negative” sentences (the ones with an odd number of leading “not” symbols). Sort the positive sentences by length, and then sort them lexicographically within all sentences of the same length. This will give a list s1, s2 etc.
We will now build a growing subset S of sentences, and ensure that in the limit, S is consistent and complete.
At stage 0, S is empty. Call this S0.
At stage n: Toss a fair coin, and if it lands heads then add sn to S. If it lands tails then add ~sn to S.
Next, systematically search through all proofs of length up to the length of sn to see if there are any inconsistencies in S. If a proof of inconsistency is found, then list the subset of positive and negative sentences which create the inconsistency e.g. {sa, ~sb, ~sc, …, sz}; let k be the largest index in this list (the largest value of a, b, …, z). If sk was in S, then remove it and add ~sk instead, or if ~sk was in S then replace it by sk. Restart the search for inconsistencies. When the search completes without finding any further inconsistencies we have the set Sn.
We now take the obvious limit Sw of the sets Sn as n tends to infinity: s will belong to Sw if and only if it belongs to all but a finite number of Sn. That limit Sw is guaranteed to exist, because for each m, the process will eventually find a consistent choice of sentences and negations from within {s1, … ,sm} and then stick with it (any contradictions found later will cause the replacement of sentences sk or ~sk with k > m). Further, Sw is guaranteed to be consistent (because any contradictions would have been discovered and removed in some Sn). Finally Sw is guaranteed to be complete, since it contains either sm or ~sm for each m.
We can now define probabilities PD(s) = P(s is a member of SD) and take P(s) as the limit of PD(s) as D tends to infinity. The limit is always defined since it is just the probability that s is a member of Sw.
So, I could instead think of iteratively eliminating inconsistent sets of sentences from the measure M, and looking at the limit of M(X)/M(all). I will assume for convenience that we eliminate one inconsistency at a time (so, remove mass from one set of sentences at a time). Let’s call this M{d}, so M{d+1} has one more inconsistent set removed from the measure (which is to say, set to measure 0).
Each set eliminated may take some mass away from the numerator, and will definitely take some mass away from the denominator. (Yet the numerator will always remain less than or equal to the denominator.)
If we remove a set of sentences which does not include X and has no overlap with any of the other sets of sentences removed so far, then the mass of both the numerator and the denominator get multiplied by 1 - .5^n, where n is the number of sentences in the set. To make this more general, for M{0}(Y) we can think of ~Y as having already been removed from its mass to begin; therefore, we can say that for any M{d}(Y), if d+1 is based on an inconsistent set S not involving anything removed from M{d}(Y) previously, then M{d+1}(Y) = M{d}(Y) * (1 - .5^|S|).
If S does contain a sentence which has been involved in inconsistency previously, we remove a strictly smaller fraction. If a subset of S has already been removed, then nothing gets removed at all. But if a set S being removed from M{d}(Y) contains only Y and some other sentences which have never appeared before, then M{d+1}(Y) = M{d}(Y) * (1 - .5^|S-1|).
Let’s name the ratio M(X)/M(all) at depth d as R(d). Considering R(d)/R(d+1), we know its limit must be in the range [0,1], because M(X) cannot be more than M(all) (so R(d) cannot diverge upward). If the limit of R(d) is greater than zero, the limit of R(d)/R(d+1) must be 1.
Assume X and ~X are both consistent.
In some ordering of logical depth, we have that: infinitely often, we eliminate a set of the form {Z, Z->~X, X} where Z and Z->~X have both never been seen before in an elimination set. Thus, M{d+1}(X) = .75 M{d}(X).
M{d}(all) = M{d}(X) + M{d}(~X). Nothing will be removed from M{d}(~X). Since X and ~X are consistent, both have some measure. Therefore, M{d+1}(all) = .75 M{d}(X) + M{d}(~X) = M{d}(all) ( .75 R(d) + M{d}(~X)/M{d}(all)) = M{d}(all) ( .75 R(d) + 1 - R(d)) = M{d}(all) * (1 - .25 R(d)).
I didn’t follow but the conclusion doesn’t sound right to me. Your argument looks like it should apply to any language and proof system. So if I introduce “X” as a logical symbol that isn’t constrained by any axioms or inference rules, why would its probability go to either 0 or 1? It seems to converge to 1⁄2 since for any consistent truth assignment with X=T we have a symmetric consistent truth assignment with X=F.
Yea, I suspect that’s right: so if we introduce X, it must go to either zero or 1, but either option violates the symmetry between X and ~X. Therefore, it must not converge. But this needs to be formalized more before I’m sure.
To convey my argument without as much of the math:
Suppose that P(X) is 1⁄2 at some stage. Then there will be inconsistent sets yet to remove which will take it at least C away from 1⁄2, where C is a constant that does not go down as the process continues.
The intuition behind this is that removing an inconsistent sentence which has not appeared in any of the inconsistencies removed so far, reduces mass by 1⁄2. Thus, the mass is changing significantly, all the time. Now to make this into a proof we need to show that P(X) changes significantly no matter how far into the process we go; IE, we need to show that a significantly different amount of mass can be removed from the ‘top’ and the ‘bottom’ (in the fraction M(X) / M(all) at finite depth).
The inconsistency {Y, Y->X, ~X} is supposed to achieve this: it only removes mass from the bottom, but there are infinitely many sets like this (we can make Y arbitrarily complex), and each of them reduces the bottom portion by the same fraction without touching the top. Specifically, the bottom becomes 7/8ths of its size (if I’ve done it right), so P(x) becomes roughly .57.
The fraction can re-adjust by decreasing the top in some other way, but this doesn’t allow convergence. No matter how many times the fraction reaches .5 again, there’s a new Y which can be used to force it to .57.
I don’t see the need for 2 parameters. The way I formulated it, there is only 1 parameter: the depth of analysis D. I always consider all sentences. This makes sense because all but a finite set of sentences cannot figure in a contradiction of length ⇐ D, so all but a finite set of sentences get probability 1⁄2 for any given D.
Regarding convergence of probabilities of undecidable statements as D → infinity, well, I don’t know how to prove it, but I also don’t know how to disprove it. I can try to assign a probability to it… ;) Is the result by Sawin you mentioned published somewhere?
Is it important to the analysis whether the probabilities converge as D tends to infinity? Do you rely on this at any point?
If you need to make sure the probabilities converge, then you could consider something like the following.
First, split the sentences in your system F into “positive” sentences (the ones which have no leading “not” symbol, ~, or else which have an even number of leading “not” symbols) and “negative” sentences (the ones with an odd number of leading “not” symbols). Sort the positive sentences by length, and then sort them lexicographically within all sentences of the same length. This will give a list s1, s2 etc.
We will now build a growing subset S of sentences, and ensure that in the limit, S is consistent and complete.
At stage 0, S is empty. Call this S0.
At stage n: Toss a fair coin, and if it lands heads then add sn to S. If it lands tails then add ~sn to S. Next, systematically search through all proofs of length up to the length of sn to see if there are any inconsistencies in S. If a proof of inconsistency is found, then list the subset of positive and negative sentences which create the inconsistency e.g. {sa, ~sb, ~sc, …, sz}; let k be the largest index in this list (the largest value of a, b, …, z). If sk was in S, then remove it and add ~sk instead, or if ~sk was in S then replace it by sk. Restart the search for inconsistencies. When the search completes without finding any further inconsistencies we have the set Sn.
We now take the obvious limit Sw of the sets Sn as n tends to infinity: s will belong to Sw if and only if it belongs to all but a finite number of Sn. That limit Sw is guaranteed to exist, because for each m, the process will eventually find a consistent choice of sentences and negations from within {s1, … ,sm} and then stick with it (any contradictions found later will cause the replacement of sentences sk or ~sk with k > m). Further, Sw is guaranteed to be consistent (because any contradictions would have been discovered and removed in some Sn). Finally Sw is guaranteed to be complete, since it contains either sm or ~sm for each m.
We can now define probabilities PD(s) = P(s is a member of SD) and take P(s) as the limit of PD(s) as D tends to infinity. The limit is always defined since it is just the probability that s is a member of Sw.
Convergence is not an issue as long as the utility function is computable.
Soon I’m going to write more about logical uncertainty in the context of the Loebian obstacle.
Ok, I see.
So, I could instead think of iteratively eliminating inconsistent sets of sentences from the measure M, and looking at the limit of M(X)/M(all). I will assume for convenience that we eliminate one inconsistency at a time (so, remove mass from one set of sentences at a time). Let’s call this M{d}, so M{d+1} has one more inconsistent set removed from the measure (which is to say, set to measure 0).
Each set eliminated may take some mass away from the numerator, and will definitely take some mass away from the denominator. (Yet the numerator will always remain less than or equal to the denominator.)
If we remove a set of sentences which does not include X and has no overlap with any of the other sets of sentences removed so far, then the mass of both the numerator and the denominator get multiplied by 1 - .5^n, where n is the number of sentences in the set. To make this more general, for M{0}(Y) we can think of ~Y as having already been removed from its mass to begin; therefore, we can say that for any M{d}(Y), if d+1 is based on an inconsistent set S not involving anything removed from M{d}(Y) previously, then M{d+1}(Y) = M{d}(Y) * (1 - .5^|S|).
If S does contain a sentence which has been involved in inconsistency previously, we remove a strictly smaller fraction. If a subset of S has already been removed, then nothing gets removed at all. But if a set S being removed from M{d}(Y) contains only Y and some other sentences which have never appeared before, then M{d+1}(Y) = M{d}(Y) * (1 - .5^|S-1|).
Let’s name the ratio M(X)/M(all) at depth d as R(d). Considering R(d)/R(d+1), we know its limit must be in the range [0,1], because M(X) cannot be more than M(all) (so R(d) cannot diverge upward). If the limit of R(d) is greater than zero, the limit of R(d)/R(d+1) must be 1.
Assume X and ~X are both consistent.
In some ordering of logical depth, we have that: infinitely often, we eliminate a set of the form {Z, Z->~X, X} where Z and Z->~X have both never been seen before in an elimination set. Thus, M{d+1}(X) = .75 M{d}(X).
M{d}(all) = M{d}(X) + M{d}(~X). Nothing will be removed from M{d}(~X). Since X and ~X are consistent, both have some measure. Therefore, M{d+1}(all) = .75 M{d}(X) + M{d}(~X) = M{d}(all) ( .75 R(d) + M{d}(~X)/M{d}(all)) = M{d}(all) ( .75 R(d) + 1 - R(d)) = M{d}(all) * (1 - .25 R(d)).
Thus, infinitely often, R(d+1) = {.75 M{d}(X)} / {M{d}(all) * (1 - .25 R(d))} = .75 R(d) / (1 - .25 R(d))
Let c be R(d+1) - R(d).
So, infinitely often, we have
R(d) + c = .75 R(d) / (1 - .25 R(d))
c = .75 R(d) / (1 - .25 R(d)) - R(d)
If c goes to zero, R(d) must go to:
0 = .75 R(d) / (1 - .25 R(d)) - R(d)
R(d) = .75 R(d) / (1 - .25 R(d))
1 = .75 / (1 - .25 R(d))
1 - .25 R(d) = .75
.25 = .25 R(d)
R(d) = 1
So we see that if the probability converges, it must go to either 0 or 1! Unless I’ve made some mistake.
I didn’t follow but the conclusion doesn’t sound right to me. Your argument looks like it should apply to any language and proof system. So if I introduce “X” as a logical symbol that isn’t constrained by any axioms or inference rules, why would its probability go to either 0 or 1? It seems to converge to 1⁄2 since for any consistent truth assignment with X=T we have a symmetric consistent truth assignment with X=F.
Yea, I suspect that’s right: so if we introduce X, it must go to either zero or 1, but either option violates the symmetry between X and ~X. Therefore, it must not converge. But this needs to be formalized more before I’m sure.
I think it will converge to 1⁄2 because the symmetry applies at each level of depth, not just at the limit (at least approximately)
To convey my argument without as much of the math:
Suppose that P(X) is 1⁄2 at some stage. Then there will be inconsistent sets yet to remove which will take it at least C away from 1⁄2, where C is a constant that does not go down as the process continues.
The intuition behind this is that removing an inconsistent sentence which has not appeared in any of the inconsistencies removed so far, reduces mass by 1⁄2. Thus, the mass is changing significantly, all the time. Now to make this into a proof we need to show that P(X) changes significantly no matter how far into the process we go; IE, we need to show that a significantly different amount of mass can be removed from the ‘top’ and the ‘bottom’ (in the fraction M(X) / M(all) at finite depth).
The inconsistency {Y, Y->X, ~X} is supposed to achieve this: it only removes mass from the bottom, but there are infinitely many sets like this (we can make Y arbitrarily complex), and each of them reduces the bottom portion by the same fraction without touching the top. Specifically, the bottom becomes 7/8ths of its size (if I’ve done it right), so P(x) becomes roughly .57.
The fraction can re-adjust by decreasing the top in some other way, but this doesn’t allow convergence. No matter how many times the fraction reaches .5 again, there’s a new Y which can be used to force it to .57.