Very nice that you’re doing this! I’d like to understand your proposed prior a little better:
1) Does the probability of any given statement converge to a limit as the proof length goes to infinity? That’s trivially true for statements that are proven true or false by PA, but what about independent statements like Con(PA)?
2) Does the probability of any given statement strongly depend on the encoding used for statements, or can we somehow “wash out” that dependence?
I’d like to understand your proposed prior a little better
Me too! I.e., I understand the definition, but not very much about its properties. Your questions are helpful, thanks!
1) Does the probability of a single statement converge to a limit as the proof length goes to infinity?
You mean if we keep the 3^^^3 fixed, but replace 4^^^^4 by larger and larger numbers? Yes: for a particular world X, if (not s_X) is proven by PA at all, it’s proven after some finite number of steps, and taking the maximum of these over all X, we see that there is some number we can replace 4^^^^4 by after which nothing changes. But it’s important to keep in mind that the probabilities our agent uses may not be close to that limit! In particular, all but one of the probabilities we force to be >0 will in the limit be 0, because PA can obviously simulate our agent and figure out what it returns.
An interesting followup question is: consider the probability-in-the-limit P_lim(x) of some statement x, as in the previous paragraph. If we now increase the 3^^^3 parameter, does P_lim(x) tend to a limit? Don’t have an answer to that one, so far.
2) Does the probability of a single statement strongly depend on the encoding used for statements, or can we somehow “wash out” that dependency?
If we keep the 3^^^3 number fixed and allow very weird encodings, it seems likely that we can do ugly things, though I haven’t checked in detail. My intuition would be that if any nice result is to be had at all, it would be of a similar form as those about Kolmogorov complexity—something like, two fixed encodings cannot differ by more than a constant if you let the numeric parameters grow large enough. But that’s just a guess.
Oops, our comments crossed. Here are two points I wrote in reply to your original comment:
Nit: I don’t think there’s any particular reason to think that [lim_m lim_n prior at (m,n)] will be equal to [lim_n lim_m prior at (m,n)]. (You wrote about what we get at “(infinity,infinity)”.)
I would hope that if the proposed limit exists at all, you could define it over a nice probability space: I imagine that the individual outcomes (“worlds”) would be all sets of sentences of the language of PA, the sigma algebra of events would be the same as for the probability space of a countable sequence of coin tosses (a standard construction), and the set of all complete theories extending PA would be measurable and have probability 1. (Or maybe that is too much to hope for...)
I haven’t thought about it too much more, but I do not see a promising avenue to follow towards prowing that P_lim(x) converges.
Some good news: if I didn’t make a mistake, the set of all complete theories extending PA is indeed a measurable set in the “countable sequence of coin tosses” space; hint: it’s the countable intersection of a sequence of events induced by replacing 3^^^3 by larger and larger numbers. (I’d be happy to expand on this if someone is interested.) Therefore, we have a measurable subspace, on which an appropriate probability measure could exist. Unfortunately, the subspace has probability zero in the original probability space, so you can’t just do a Bayesian update (that would imply a division by zero).
It just occurred to me that you could sidestep the question and instead define P_lim(x) directly. For example, you could say something like
P_lim(x) = P(PA+x+y is consistent | PA+y is consistent)
if y is randomly chosen from all statements in PA using a simple length-based prior. This way P_lim(x)=1 if PA proves X, and P_lim(x)=0 if PA disproves X, as expected. Then you could view your original definition as a sort of approximation scheme for this one.
This way of assigning probabilities as “degrees of consistency” doesn’t seem to contain much more information than PA about the “true integers”, but if we just need a prior, something like it could work.
It doesn’t seem to me like your prior needs to encode information about the “true integers” any more than it needs to encode information about the world. It just needs to be flexible enough to see the structure that is there.
Maybe this is what you mean by “if we just need a prior,” but it seems like an important point.
While we’re at it, it’s not even clear our prior needs to capture PA. Maybe you could use something much weaker, perhaps even a theory so weak it could assert its own (cut free) consistency, and then just learn the structure of PA by observing some computations (or however you gain info about logical facts).
OK, so by taking the limit of priors at (m,n) we get a well-defined prior at (m,infinity). What happens if we take the limit of those, do we get a nice prior at (infinity,infinity) for individual statements (not worlds)?
[ETA: oops, comments crossed; cousin_it originally posted a restatement of my followup question.]
Yes, that’s what I meant. Nit, though: I don’t think there’s any particular reason to think that [lim_m lim_n prior at (m,n)] will be equal to [lim_n lim_m prior at (m,n)].
I would hope that if the proposed limit exists at all, you could define it over a nice probability space: I imagine that the individual outcomes (“worlds”) would be all sets of sentences of the language of PA, the sigma algebra of events would be the same as for the probability space of a countable sequence of coin tosses (a standard construction), and the set of all complete theories extending PA would be measurable and have probability 1. (Or maybe that is too much to hope for...)
Very nice that you’re doing this! I’d like to understand your proposed prior a little better:
1) Does the probability of any given statement converge to a limit as the proof length goes to infinity? That’s trivially true for statements that are proven true or false by PA, but what about independent statements like Con(PA)?
2) Does the probability of any given statement strongly depend on the encoding used for statements, or can we somehow “wash out” that dependence?
Me too! I.e., I understand the definition, but not very much about its properties. Your questions are helpful, thanks!
You mean if we keep the 3^^^3 fixed, but replace 4^^^^4 by larger and larger numbers? Yes: for a particular world X, if (not s_X) is proven by PA at all, it’s proven after some finite number of steps, and taking the maximum of these over all X, we see that there is some number we can replace 4^^^^4 by after which nothing changes. But it’s important to keep in mind that the probabilities our agent uses may not be close to that limit! In particular, all but one of the probabilities we force to be >0 will in the limit be 0, because PA can obviously simulate our agent and figure out what it returns.
An interesting followup question is: consider the probability-in-the-limit P_lim(x) of some statement x, as in the previous paragraph. If we now increase the 3^^^3 parameter, does P_lim(x) tend to a limit? Don’t have an answer to that one, so far.
If we keep the 3^^^3 number fixed and allow very weird encodings, it seems likely that we can do ugly things, though I haven’t checked in detail. My intuition would be that if any nice result is to be had at all, it would be of a similar form as those about Kolmogorov complexity—something like, two fixed encodings cannot differ by more than a constant if you let the numeric parameters grow large enough. But that’s just a guess.
Yeah, I guess your “interesting followup question” is what interests me as well...
Oops, our comments crossed. Here are two points I wrote in reply to your original comment:
Nit: I don’t think there’s any particular reason to think that [lim_m lim_n prior at (m,n)] will be equal to [lim_n lim_m prior at (m,n)]. (You wrote about what we get at “(infinity,infinity)”.)
I would hope that if the proposed limit exists at all, you could define it over a nice probability space: I imagine that the individual outcomes (“worlds”) would be all sets of sentences of the language of PA, the sigma algebra of events would be the same as for the probability space of a countable sequence of coin tosses (a standard construction), and the set of all complete theories extending PA would be measurable and have probability 1. (Or maybe that is too much to hope for...)
I haven’t thought about it too much more, but I do not see a promising avenue to follow towards prowing that P_lim(x) converges.
Some good news: if I didn’t make a mistake, the set of all complete theories extending PA is indeed a measurable set in the “countable sequence of coin tosses” space; hint: it’s the countable intersection of a sequence of events induced by replacing 3^^^3 by larger and larger numbers. (I’d be happy to expand on this if someone is interested.) Therefore, we have a measurable subspace, on which an appropriate probability measure could exist. Unfortunately, the subspace has probability zero in the original probability space, so you can’t just do a Bayesian update (that would imply a division by zero).
It just occurred to me that you could sidestep the question and instead define P_lim(x) directly. For example, you could say something like
P_lim(x) = P(PA+x+y is consistent | PA+y is consistent)
if y is randomly chosen from all statements in PA using a simple length-based prior. This way P_lim(x)=1 if PA proves X, and P_lim(x)=0 if PA disproves X, as expected. Then you could view your original definition as a sort of approximation scheme for this one.
This way of assigning probabilities as “degrees of consistency” doesn’t seem to contain much more information than PA about the “true integers”, but if we just need a prior, something like it could work.
It doesn’t seem to me like your prior needs to encode information about the “true integers” any more than it needs to encode information about the world. It just needs to be flexible enough to see the structure that is there.
Maybe this is what you mean by “if we just need a prior,” but it seems like an important point.
While we’re at it, it’s not even clear our prior needs to capture PA. Maybe you could use something much weaker, perhaps even a theory so weak it could assert its own (cut free) consistency, and then just learn the structure of PA by observing some computations (or however you gain info about logical facts).
OK, so by taking the limit of priors at (m,n) we get a well-defined prior at (m,infinity). What happens if we take the limit of those, do we get a nice prior at (infinity,infinity) for individual statements (not worlds)?
[ETA: oops, comments crossed; cousin_it originally posted a restatement of my followup question.]
Yes, that’s what I meant. Nit, though: I don’t think there’s any particular reason to think that [lim_m lim_n prior at (m,n)] will be equal to [lim_n lim_m prior at (m,n)].
I would hope that if the proposed limit exists at all, you could define it over a nice probability space: I imagine that the individual outcomes (“worlds”) would be all sets of sentences of the language of PA, the sigma algebra of events would be the same as for the probability space of a countable sequence of coin tosses (a standard construction), and the set of all complete theories extending PA would be measurable and have probability 1. (Or maybe that is too much to hope for...)