So, first of all, there isn’t really a particular formal proof system in the paper, except sort of implicitly; all we get is a function P(.) from L’-sentences to real numbers. However, we can talk about the first-order theory you obtain by taking the base theory T and adding to it the reflection axioms, i.e. the axioms a < P(‘G’) < b for all G such that the function P(.) satisfies a < P(G) < b. [For anyone not paying very close attention, P(‘.’) is a function symbol in the formal language, while P(.) is a meta-language function from the formal language to the real numbers.] The main theorem is that P(.) assigns probability 1 to this theory (and therefore to all consequences of this theory), so it is interesting to ask what it entails.
Your question, if I understand correctly, is whether this theory does in particular entail
[-1 < P(‘G’) < 1] --> [P(‘-1 < P(‘G’) < 1′) = 1]
(note that this is simply a sentence in the language L’, not a meta-language statement). There’s no obvious reason that the system should be able to prove this—all we have is T and the individual axioms of the form [a < P(‘G’) < b] --, and indeed your proof shows that (unless there is an inconsistency in the paper) this is not a theorem of the system: if it were, then so would be P(‘G’) = 1, and therefore ¬G, leading to a contradiction since P(.) assigns probability 1 to all theorems of this system.
In short, yes, there are true statements about P(.) such that the corresponding statement about P(‘.’) is false in the formal system, and assigned zero probability by P(.). And yes, I do believe this is worth keeping in mind.
Here is my current preferred way of thinking about this: In non-standard analysis, there is a “standard” operator st(.) which takes a non-standard real number x and returns the standard number st(x) that x is infinitesimally close to (assuming one exists, which is the case if x is not unbounded). This operator can also be applied to functions f; st(f)(x) is the standard number which f(x) is infinitesimally close to. I imagine that there is a collection of “possible” non-standard models, each of which fulfills all the axioms that P(.) assigns probability 1, and that one of these possible non-standard models is the true one describing the real world. Then, P(.) is st(P(‘.’)), where P(‘.’) is the function in the true non-standard model. [ETA: This is always defined because we have −1 < P(‘G’) < 2, so every value of P(‘.’) is infinitesimally close to a standard number.]
How could the real world be described by a non-standard model, you ask? The following is rather handwavy, but I’m imagining the agent as computing an infinite sequence of approximations P_i(.) to the intended distribution, and the true model to be the ultrapower of some standard model of ZFC, with P(‘.’) being interpreted as P_i(.) in the i’th component of the ultrapower. Whether any leverage can be gotten out of making this idea formal, I don’t know. (And it’s certainly true that to pull off this intuition, I still need to have a free ultrafilter falling out of the sky.)
Ok, for [a < P(‘G’) < b] I see why you’d use a schema (because it’s an object level axiom, not a meta-language axiom).
But this still seems possibly problematic. We know that adding axioms like [-1 < P(‘G’) < 1] --> [P(‘-1 < P(‘G’) < 1′) = 1], would break the system. But I don’t see a reason to suppose that the other reflection axioms don’t break the system. It might or it might not, but I’m not sure; was there a proof along the lines of “this system is inconsistent if and only if the initial system is” or something?
I’m not sure whether I’m misunderstanding your point, but the paper proves that there is a coherent probability distribution P(.) that assigns probability 1 to both T and to the collection of reflection axioms [a < P(‘G’) < b] for P(.); this implies that there is a probability distribution over complete theories assigning probability 1 to (T + the reflection axioms for P). But if (T + the reflection axioms for P) were inconsistent, then there would be no complete theory extending it, so this would be the empty event and would have to be assigned probability 0 by any coherent probability distribution. It follows that (T + the reflection axioms for P) is consistent. (NB: by “the reflection axioms for P(.)”, I only mean the appropriate instances of [a < P(‘G’) < b], not anything that quantifies over a, b or G inside the object language.)
So, first of all, there isn’t really a particular formal proof system in the paper, except sort of implicitly; all we get is a function P(.) from L’-sentences to real numbers. However, we can talk about the first-order theory you obtain by taking the base theory T and adding to it the reflection axioms, i.e. the axioms a < P(‘G’) < b for all G such that the function P(.) satisfies a < P(G) < b. [For anyone not paying very close attention, P(‘.’) is a function symbol in the formal language, while P(.) is a meta-language function from the formal language to the real numbers.] The main theorem is that P(.) assigns probability 1 to this theory (and therefore to all consequences of this theory), so it is interesting to ask what it entails.
Your question, if I understand correctly, is whether this theory does in particular entail
[-1 < P(‘G’) < 1] --> [P(‘-1 < P(‘G’) < 1′) = 1]
(note that this is simply a sentence in the language L’, not a meta-language statement). There’s no obvious reason that the system should be able to prove this—all we have is T and the individual axioms of the form [a < P(‘G’) < b] --, and indeed your proof shows that (unless there is an inconsistency in the paper) this is not a theorem of the system: if it were, then so would be P(‘G’) = 1, and therefore ¬G, leading to a contradiction since P(.) assigns probability 1 to all theorems of this system.
In short, yes, there are true statements about P(.) such that the corresponding statement about P(‘.’) is false in the formal system, and assigned zero probability by P(.). And yes, I do believe this is worth keeping in mind.
Here is my current preferred way of thinking about this: In non-standard analysis, there is a “standard” operator st(.) which takes a non-standard real number x and returns the standard number st(x) that x is infinitesimally close to (assuming one exists, which is the case if x is not unbounded). This operator can also be applied to functions f; st(f)(x) is the standard number which f(x) is infinitesimally close to. I imagine that there is a collection of “possible” non-standard models, each of which fulfills all the axioms that P(.) assigns probability 1, and that one of these possible non-standard models is the true one describing the real world. Then, P(.) is st(P(‘.’)), where P(‘.’) is the function in the true non-standard model. [ETA: This is always defined because we have −1 < P(‘G’) < 2, so every value of P(‘.’) is infinitesimally close to a standard number.]
How could the real world be described by a non-standard model, you ask? The following is rather handwavy, but I’m imagining the agent as computing an infinite sequence of approximations P_i(.) to the intended distribution, and the true model to be the ultrapower of some standard model of ZFC, with P(‘.’) being interpreted as P_i(.) in the i’th component of the ultrapower. Whether any leverage can be gotten out of making this idea formal, I don’t know. (And it’s certainly true that to pull off this intuition, I still need to have a free ultrafilter falling out of the sky.)
Ok, for [a < P(‘G’) < b] I see why you’d use a schema (because it’s an object level axiom, not a meta-language axiom).
But this still seems possibly problematic. We know that adding axioms like [-1 < P(‘G’) < 1] --> [P(‘-1 < P(‘G’) < 1′) = 1], would break the system. But I don’t see a reason to suppose that the other reflection axioms don’t break the system. It might or it might not, but I’m not sure; was there a proof along the lines of “this system is inconsistent if and only if the initial system is” or something?
I’m not sure whether I’m misunderstanding your point, but the paper proves that there is a coherent probability distribution P(.) that assigns probability 1 to both T and to the collection of reflection axioms [a < P(‘G’) < b] for P(.); this implies that there is a probability distribution over complete theories assigning probability 1 to (T + the reflection axioms for P). But if (T + the reflection axioms for P) were inconsistent, then there would be no complete theory extending it, so this would be the empty event and would have to be assigned probability 0 by any coherent probability distribution. It follows that (T + the reflection axioms for P) is consistent. (NB: by “the reflection axioms for P(.)”, I only mean the appropriate instances of [a < P(‘G’) < b], not anything that quantifies over a, b or G inside the object language.)