Oops, our comments crossed. Here are two points I wrote in reply to your original comment:
Nit: I don’t think there’s any particular reason to think that [lim_m lim_n prior at (m,n)] will be equal to [lim_n lim_m prior at (m,n)]. (You wrote about what we get at “(infinity,infinity)”.)
I would hope that if the proposed limit exists at all, you could define it over a nice probability space: I imagine that the individual outcomes (“worlds”) would be all sets of sentences of the language of PA, the sigma algebra of events would be the same as for the probability space of a countable sequence of coin tosses (a standard construction), and the set of all complete theories extending PA would be measurable and have probability 1. (Or maybe that is too much to hope for...)
I haven’t thought about it too much more, but I do not see a promising avenue to follow towards prowing that P_lim(x) converges.
Some good news: if I didn’t make a mistake, the set of all complete theories extending PA is indeed a measurable set in the “countable sequence of coin tosses” space; hint: it’s the countable intersection of a sequence of events induced by replacing 3^^^3 by larger and larger numbers. (I’d be happy to expand on this if someone is interested.) Therefore, we have a measurable subspace, on which an appropriate probability measure could exist. Unfortunately, the subspace has probability zero in the original probability space, so you can’t just do a Bayesian update (that would imply a division by zero).
It just occurred to me that you could sidestep the question and instead define P_lim(x) directly. For example, you could say something like
P_lim(x) = P(PA+x+y is consistent | PA+y is consistent)
if y is randomly chosen from all statements in PA using a simple length-based prior. This way P_lim(x)=1 if PA proves X, and P_lim(x)=0 if PA disproves X, as expected. Then you could view your original definition as a sort of approximation scheme for this one.
This way of assigning probabilities as “degrees of consistency” doesn’t seem to contain much more information than PA about the “true integers”, but if we just need a prior, something like it could work.
It doesn’t seem to me like your prior needs to encode information about the “true integers” any more than it needs to encode information about the world. It just needs to be flexible enough to see the structure that is there.
Maybe this is what you mean by “if we just need a prior,” but it seems like an important point.
While we’re at it, it’s not even clear our prior needs to capture PA. Maybe you could use something much weaker, perhaps even a theory so weak it could assert its own (cut free) consistency, and then just learn the structure of PA by observing some computations (or however you gain info about logical facts).
Oops, our comments crossed. Here are two points I wrote in reply to your original comment:
Nit: I don’t think there’s any particular reason to think that [lim_m lim_n prior at (m,n)] will be equal to [lim_n lim_m prior at (m,n)]. (You wrote about what we get at “(infinity,infinity)”.)
I would hope that if the proposed limit exists at all, you could define it over a nice probability space: I imagine that the individual outcomes (“worlds”) would be all sets of sentences of the language of PA, the sigma algebra of events would be the same as for the probability space of a countable sequence of coin tosses (a standard construction), and the set of all complete theories extending PA would be measurable and have probability 1. (Or maybe that is too much to hope for...)
I haven’t thought about it too much more, but I do not see a promising avenue to follow towards prowing that P_lim(x) converges.
Some good news: if I didn’t make a mistake, the set of all complete theories extending PA is indeed a measurable set in the “countable sequence of coin tosses” space; hint: it’s the countable intersection of a sequence of events induced by replacing 3^^^3 by larger and larger numbers. (I’d be happy to expand on this if someone is interested.) Therefore, we have a measurable subspace, on which an appropriate probability measure could exist. Unfortunately, the subspace has probability zero in the original probability space, so you can’t just do a Bayesian update (that would imply a division by zero).
It just occurred to me that you could sidestep the question and instead define P_lim(x) directly. For example, you could say something like
P_lim(x) = P(PA+x+y is consistent | PA+y is consistent)
if y is randomly chosen from all statements in PA using a simple length-based prior. This way P_lim(x)=1 if PA proves X, and P_lim(x)=0 if PA disproves X, as expected. Then you could view your original definition as a sort of approximation scheme for this one.
This way of assigning probabilities as “degrees of consistency” doesn’t seem to contain much more information than PA about the “true integers”, but if we just need a prior, something like it could work.
It doesn’t seem to me like your prior needs to encode information about the “true integers” any more than it needs to encode information about the world. It just needs to be flexible enough to see the structure that is there.
Maybe this is what you mean by “if we just need a prior,” but it seems like an important point.
While we’re at it, it’s not even clear our prior needs to capture PA. Maybe you could use something much weaker, perhaps even a theory so weak it could assert its own (cut free) consistency, and then just learn the structure of PA by observing some computations (or however you gain info about logical facts).