This would give you a uniform prior over digits when asking about the nth digit of \pi.
This is very likely a non sequitur. But yes, I agree that in this formalism it’s tough to express the probability of a sentence.
free-standing assumptions (unbound variables) being used only for things like empirical hypotheses, as opposed to computations we just haven’t managed to do.
If this mean what I think it means, then yes, this was my interpretation too—plus assuming the truth or falsity of propositions that are undecided by the axioms.
In light of “that stuff” (probabilities over digits of \pi) being largely separate from a full way of establishing probability values for arbitrary logical formulas
I would like to reiterate that there is a difference between assigning probabilities and the project of logical probability. Probabilities have to follow the product rule, which is contains modus ponens in the case of certainty. If you assign probability 1 to the axioms, there is only one correct distribution for anything that follows from the axioms, like a digit of pi (or as the kids are calling it these days, $\pi$): probability 1 for the right answer, 0 for the wrong answers. If you want logical probabilities to deviate from this (like by being ignorant about some digit of pi), then logical probabilities cannot follow the same rules as probabilities.
I will probably go look up Kolmogorov’s method for constructing measures over countably infinite sets and write an entire article on using Kolmogorov measures over arbitrary inductive types and equational reasoning to build distributions over non-normalized computational terms.
Cool. If you use small words, then I will be happy :)
If this mean what I think it means, then yes, this was my interpretation too—plus assuming the truth or falsity of propositions that are undecided by the axioms.
Or assigning degrees of belief to the axioms themselves! Any mutually-consistent set of statements will allow some probability assignment in which each statement has a nonzero degree of belief.
If you assign probability 1 to the axioms, there is only one correct distribution for anything that follows from the axioms, like a digit of pi (or as the kids are calling it these days, $\pi$): probability 1 for the right answer, 0 for the wrong answers. If you want logical probabilities to deviate from this (like by being ignorant about some digit of pi), then logical probabilities cannot follow the same rules as probabilities.
Which is actually an issue I did bring up, in proof-theoretic language. Let’s start by setting the term “certain distribution” to mean “1.0 to the right answer, 0.0 to everything else”, and then set “uncertain distribution” to be “all distributions wider than this”.
Except that actually means we’re talking about something like a relationship between head-normal-form proof objects and non-normalized proof objects,
A non-normalized proof object (read: the result of a computation we haven’t done yet, a lazily evaluated piece of data) has an uncertain distribution over values. To head-normalize a proof object means to evaluate enough of it to decide (compute the identity of, with certainty) the outermost introduction rule (outermost data constructor), but (by default) this still leaves us uncertain of the premises which were given to that introduction rule (the parameters passed to the data constructor). As we proceed down the levels of the tree of deduction rules, head-normalizing as we go, we eventually arrive to a completely normalized proof-term/piece of data, for which no computation remains undone.
Only a completely normalized proof term has a certain distribution: 1.0 for the right answer, 0.0 for everything else. In all other cases, we have an uncertain distribution, albeit one in which all the probability mass may be allocated in the particular region associated with only one outermost introduction rule (the special case of head-normalized terms).
Remember, these distributions are being assigned over proof terms (ie: constructions in some kind of lambda calculus), not over syntactic values (well-typed and fully normalized terms). The distributions express our state of knowledge given the limited certainty granted by however many levels of the tree (all terms of inductive types can be written as trees) are already normalized to data constructors—they express uncertain states of knowledge in situations where the uncertainty derives from having a fraction of the computational power necessary to decide a decidable question.
So when we’re talking about “digits of $\pi$” sorts of problems, we should actually speak of distributions over computational terms, not “logical probabilities”. This problem has nothing to do with assigning probabilities to sentences in first-order classical logic: by the time we construct and reason with our distributions over computational terms, we have already established the existence of a proof object inhabiting a particular type. If that type happened to be a mere proposition, then we’ve already established, by the time we construct this distribution, that we believe in the truth of the proposition, and are merely reasoning under discrete uncertainty over how its truth was proof-theoretically asserted.
Whereas I think that establishing probabilities for sentences which are not decided by our given axioms may be more difficult, particularly when those sentences may be decidable by other means.
This is very likely a non sequitur. But yes, I agree that in this formalism it’s tough to express the probability of a sentence.
If this mean what I think it means, then yes, this was my interpretation too—plus assuming the truth or falsity of propositions that are undecided by the axioms.
I would like to reiterate that there is a difference between assigning probabilities and the project of logical probability. Probabilities have to follow the product rule, which is contains modus ponens in the case of certainty. If you assign probability 1 to the axioms, there is only one correct distribution for anything that follows from the axioms, like a digit of pi (or as the kids are calling it these days, $\pi$): probability 1 for the right answer, 0 for the wrong answers. If you want logical probabilities to deviate from this (like by being ignorant about some digit of pi), then logical probabilities cannot follow the same rules as probabilities.
Cool. If you use small words, then I will be happy :)
Or assigning degrees of belief to the axioms themselves! Any mutually-consistent set of statements will allow some probability assignment in which each statement has a nonzero degree of belief.
Which is actually an issue I did bring up, in proof-theoretic language. Let’s start by setting the term “certain distribution” to mean “1.0 to the right answer, 0.0 to everything else”, and then set “uncertain distribution” to be “all distributions wider than this”.
A non-normalized proof object (read: the result of a computation we haven’t done yet, a lazily evaluated piece of data) has an uncertain distribution over values. To head-normalize a proof object means to evaluate enough of it to decide (compute the identity of, with certainty) the outermost introduction rule (outermost data constructor), but (by default) this still leaves us uncertain of the premises which were given to that introduction rule (the parameters passed to the data constructor). As we proceed down the levels of the tree of deduction rules, head-normalizing as we go, we eventually arrive to a completely normalized proof-term/piece of data, for which no computation remains undone.
Only a completely normalized proof term has a certain distribution: 1.0 for the right answer, 0.0 for everything else. In all other cases, we have an uncertain distribution, albeit one in which all the probability mass may be allocated in the particular region associated with only one outermost introduction rule (the special case of head-normalized terms).
Remember, these distributions are being assigned over proof terms (ie: constructions in some kind of lambda calculus), not over syntactic values (well-typed and fully normalized terms). The distributions express our state of knowledge given the limited certainty granted by however many levels of the tree (all terms of inductive types can be written as trees) are already normalized to data constructors—they express uncertain states of knowledge in situations where the uncertainty derives from having a fraction of the computational power necessary to decide a decidable question.
So when we’re talking about “digits of $\pi$” sorts of problems, we should actually speak of distributions over computational terms, not “logical probabilities”. This problem has nothing to do with assigning probabilities to sentences in first-order classical logic: by the time we construct and reason with our distributions over computational terms, we have already established the existence of a proof object inhabiting a particular type. If that type happened to be a mere proposition, then we’ve already established, by the time we construct this distribution, that we believe in the truth of the proposition, and are merely reasoning under discrete uncertainty over how its truth was proof-theoretically asserted.
Whereas I think that establishing probabilities for sentences which are not decided by our given axioms may be more difficult, particularly when those sentences may be decidable by other means.
Thanks for explaining more!