How about ‘not finite’? Even better, ‘non-halting’ or maybe even...‘not step-wise completable/obtainable’? Something is ’in-finite’ly far away if there is no set of steps that I can execute to obtain it within some determinate amount of time.
You’re repeating my point. It’s no harder to explain what “finite” means than it is to explain what “not finite” means. You take this to mean that the concept of “not finite” is easy, and I take it to mean that concept of “finite” is hard. Cousin it’s recent experience lends credence to my point of view.
For any N, it’s easy to explain to a computer what “cardinality smaller than N” means. It’s hard to explain to a computer what “exists N such that cardinality is smaller than N” means, and this is the source of cousin it’s trouble. The next logical step is to realize for a cognitive illusion the idea that humans have a special insight into what this means, that computers can’t have. For some reason most people aren’t willing to take that step.
If it’s just a cognitive illusion, why is it so hard to find contradictions in axiom systems about integers that were generated by humans intuitively, like PA?
We seem to have a superpower for inventing apparently consistent axiom systems that make “intuitive sense” to us. The fact that we have this superpower is even machine-checkable to a certain degree (say, generate all proofs of up to a million symbols and look for contradictions), but the superpower itself resists attempts at formalization.
We work in axiom systems that have not been proven inconsistent because in the past we have reacted to inconsistencies (such as Russel’s paradox) by abandoning the axioms. I wouldn’t call this a superpower but a bias.
Russel’s paradox is a cliche but it’s really illustrative. The principle of noncontradiction says that, since we have arrived at a false statement (the barber shaves himself and doesn’t shave himself) some of our premises must be wrong. Apparently the incorrect premise is: given a property P, there is a set of elements that satisfy P. What could it possibly mean that this premise is false? Evidently it means that the everyday meaning of words like “property,” “set,” and “element” is not clear enough to avoid contradictions. Why are you so sure that the everyday meaning of words like “number,” “successor,” and “least element” are so much clearer?
Here’s another reason to be unimpressed by the fact that no contradictions in PA have been found. The length of the shortest proof of falsum in a formal system has the same property as the busy beaver numbers: they cannot be bounded above by any computable function. i.e. there is an inconsistent formal system with fewer than 100 axioms in which all theorems with proofs of length smaller than BB(100) are consistent with each other. Why couldn’t PA be such a system?
there is an inconsistent formal system with fewer than 100 axioms in which all theorems with proofs of length smaller than BB(100) are consistent with each other. Why couldn’t PA be such a system?
I suspect that for most “naive” ways of constructing such a system X, the very long proof of inconsistency for X should reduce to a very short proof of inconsistency for X+Con(X), because the latter system should be enough to capture most “informal” reasoning that you used to construct X in the first place. The existence of such a proof wouldn’t imply the inconsistency of X directly (there are consistent systems X such that X+Con(X) is inconsistent, e.g. X=Y+not(Con(Y)) where Y is consistent), but if PA+Con(PA) were ever shown to be inconsistent, that would be highly suggestive and would cause me to abandon my intuitions expressed above.
But as far as we know now, PA+Con(PA) looks just as consistent as PA itself. Moreover, I think you can add a countable-ordinal pile of iterated Con’s on top and still get a system that’s weaker than ZFC. (I’m not 100% confident of that, would be nice if someone corrected me!)
So I’m pretty confident that PA is consistent, conditional on the statement “PA is consistent” being meaningful. Note that you need a notion of “standard integers” to make sense of the latter statement too, because integers encode proofs.
You’ve made good points (and I’ve voted up your remark) but I’d like to note a few issues:
First we can prove the consistency of PA assuming certain other axiomatic systems. In particular, Gentzen’s theorem shows that consistency of PA can be proved if one accepts a system which is incomparable in strength to PA (in the sense that there are theorems in each system which are not theorems in the other).
they cannot be bounded above by any computable function. i.e. there is an inconsistent formal system with fewer than 100 axioms in which all theorems with proofs of length smaller than BB(100) are consistent with each other.
This isn’t necessarily true unless one has a 1-1 correspondence between axiomatic systems and Turing machines, rather than just having axiomatic systems modelable as Turing machines. This is a minor detail that doesn’t impact your point in a substantial fashion.
Second, it isn’t clear how long we expect an average output of a Turing machine to be. I don’t know if anyone has done work on this, but it seems intuitively clear to me that if I pick a random Turing machine with n states, run it on the blank tape, I should expect with a high probability that the Turing machine will halt well before BB(n) or will never halt. Let’s make this claim more precise:
Definition: Let g(f)(n) be the (# of Turing machines with n states which when run on the blank tape either never halt or halt in fewer than f(n) steps) / (# of Turing machines with n states).
Question: Is there is a computable function f(n) such that g(f)(n) goes to 1 as n goes to infinity.
If such a function exists, then we should naively expect that random axiomatic systems will likely to either have an easy contradiction or be consistent. I’m not sure one way or another whether or not this function exists, but my naive guess is that it does.
Question: Is there is a computable function f(n) such that g(f)(n) goes to 1 as n goes to infinity.
Well, let’s say we want to know whether Turing machine T halts. Say T has k states. Can’t we use T’s program to find a family of programs T’ which are essentially the same as T except they consist of a common prefix followed by arbitrary suffix? Let k’ be the number of states in the ‘prefix’. The proportion of programs with ⇐ n states that belong to family T’ must be at least 2^(-k’-1) for all n >= k’.
Now let’s start running all programs and as we go along let’s keep track of whether, for each value of n, a proportion greater than or equal to 1 − 2^(-k’-1) of programs of length ⇐ n have halted in fewer than f(n) steps. Eventually this will be true for some n >= k’. At that point, we check to see whether one of the programs in T’ has halted. If not, then none of the programs in T’ can halt, and neither can T.
Therefore, such a function f could be used to solve the halting problem.
But strictly stronger in consistency strength, of course. (Consistency strength turns out to be more fundamental than logical strength simpliciter.)
Question: Is there is a computable function f(n) such that g(f)(n) goes to 1 as n goes to infinity.
Well, let’s say we want to know whether Turing machine T halts. Say T has k states. Can’t we use T’s program to find a family of programs T’ which are essentially the same as T except they consist of a common prefix followed by arbitrary suffix? Let k’ be the number of states in the ‘prefix’. The proportion of programs with less than n states that belong to family T’ must tend to 2^(-k’) as n goes to infinity. If we choose n such that g(f)(n) > 1 − 2^(-k’) then we just need to run all members of T’ for f(n) steps. If none of them have halted then T does not halt.
Therefore, such a function f could be used to solve the halting problem.
If such a function exists, then we should naively expect that random axiomatic systems will likely to either have an easy contradiction or be consistent. I’m not sure one way or another whether or not this function exists, but my naive guess is that it does.
You must be aware that such halting probabilities are usually uncomputable, right? In any case you’re not going to be surprised that I wouldn’t find any information about this limit of ratios compelling, any more than you would buy my argument that 15 is not prime because most numbers are not prime.
You must be aware that such halting probabilities are usually uncomputable, right?
Yes, but the existence of this function looks weaker than being able to compute Chaitin constants. Am I missing something here?
In any case you’re not going to be surprised that I wouldn’t find any information about this limit of ratios compelling, any more than you would buy my argument that 15 is not prime because most numbers are not prime.
My prior that a random integer is prime is 1/log n . If you give me a large integer, the chance that it is prime is very tiny and that is a good argument for assuming that your random integer really isn’t prime. I’m not sure why you think that’s not a good argument, at least in the context when I can’t verify it (if say the number is too large).
But 1/log(n) takes a long time to get small, so that the argument “15 is not prime because most numbers are not prime” is not very good. It seems even more specious in settings where we have less of a handle on what’s going on at all, such as with halting probabilities.
Are you trying to make a probability argument like this because you scanned my argument as saying “PA is likely inconsistent because a random axiom system is likely inconsistent?” That’s not what I’m trying to say at all.
But 1/log(n) takes a long time to get small, so that the argument “15 is not prime because most numbers are not prime” is not very good. It seems even more specious in settings where we have less of a handle on what’s going on at all, such as with halting probabilities
Sure, in the case of n=15, that’s a very weak argument. And just verifying is better, but the point is the overall thrust of the type of argument is valid Bayesian evidence.
Are you trying to make a probability argument like this because you scanned my argument as saying “PA is likely inconsistent because a random axiom system is likely inconsistent?” That’s not what I’m trying to say at all.
No. I’m confused as to what I said that gives you that impression. If you had said that I’d actually disagree strongly (since what it is a reasonable distribution for “random axiomatic system” is not at all obvious). My primary issue again was with the Turing machine statement, where it isn’t at all obvious how frequently a random Turing machine behaves like a Busy Beaver.
And just verifying is better, but the point is the overall thrust of the type of argument is valid Bayesian evidence.
I think you are being way to glib about the possibility of analyzing these foundational issues with probability. But let’s take for granted that it makes sense—the strength of this “Bayesian evidence” is
P(ratio goes to 1 | PA is inconsistent) / P(ratio goes to 1)
Now, I have no idea what the numerator and denominator actually mean in this instance, but informally speaking it seems to me that they are about the same size.
We can replace those “events” by predictions that I’m more comfortable evaluating using Bayes, e.g. P(JoshuaZ will find a proof that this ratio goes to 1 in the next few days) and P(JoshuaZ will find a proof that this ratio goes to 1 in the next few days | Voevodsky will find an inconsistency in PA in the next 10 years). Those are definitely about the same size.
Sure. There’s an obvious problem with what probabilities mean and how we would even discuss things like Turing machines if PA is inconsistent. One could talk about some model of Turing machines in Robinson arithmetic or the like.
But yes, I agree that using conventional probability in this way is fraught with difficulty.
How about ‘not finite’? Even better, ‘non-halting’ or maybe even...‘not step-wise completable/obtainable’? Something is ’in-finite’ly far away if there is no set of steps that I can execute to obtain it within some determinate amount of time.
You’re repeating my point. It’s no harder to explain what “finite” means than it is to explain what “not finite” means. You take this to mean that the concept of “not finite” is easy, and I take it to mean that concept of “finite” is hard. Cousin it’s recent experience lends credence to my point of view.
For any N, it’s easy to explain to a computer what “cardinality smaller than N” means. It’s hard to explain to a computer what “exists N such that cardinality is smaller than N” means, and this is the source of cousin it’s trouble. The next logical step is to realize for a cognitive illusion the idea that humans have a special insight into what this means, that computers can’t have. For some reason most people aren’t willing to take that step.
If it’s just a cognitive illusion, why is it so hard to find contradictions in axiom systems about integers that were generated by humans intuitively, like PA?
We seem to have a superpower for inventing apparently consistent axiom systems that make “intuitive sense” to us. The fact that we have this superpower is even machine-checkable to a certain degree (say, generate all proofs of up to a million symbols and look for contradictions), but the superpower itself resists attempts at formalization.
We work in axiom systems that have not been proven inconsistent because in the past we have reacted to inconsistencies (such as Russel’s paradox) by abandoning the axioms. I wouldn’t call this a superpower but a bias.
Russel’s paradox is a cliche but it’s really illustrative. The principle of noncontradiction says that, since we have arrived at a false statement (the barber shaves himself and doesn’t shave himself) some of our premises must be wrong. Apparently the incorrect premise is: given a property P, there is a set of elements that satisfy P. What could it possibly mean that this premise is false? Evidently it means that the everyday meaning of words like “property,” “set,” and “element” is not clear enough to avoid contradictions. Why are you so sure that the everyday meaning of words like “number,” “successor,” and “least element” are so much clearer?
Here’s another reason to be unimpressed by the fact that no contradictions in PA have been found. The length of the shortest proof of falsum in a formal system has the same property as the busy beaver numbers: they cannot be bounded above by any computable function. i.e. there is an inconsistent formal system with fewer than 100 axioms in which all theorems with proofs of length smaller than BB(100) are consistent with each other. Why couldn’t PA be such a system?
I suspect that for most “naive” ways of constructing such a system X, the very long proof of inconsistency for X should reduce to a very short proof of inconsistency for X+Con(X), because the latter system should be enough to capture most “informal” reasoning that you used to construct X in the first place. The existence of such a proof wouldn’t imply the inconsistency of X directly (there are consistent systems X such that X+Con(X) is inconsistent, e.g. X=Y+not(Con(Y)) where Y is consistent), but if PA+Con(PA) were ever shown to be inconsistent, that would be highly suggestive and would cause me to abandon my intuitions expressed above.
But as far as we know now, PA+Con(PA) looks just as consistent as PA itself. Moreover, I think you can add a countable-ordinal pile of iterated Con’s on top and still get a system that’s weaker than ZFC. (I’m not 100% confident of that, would be nice if someone corrected me!)
So I’m pretty confident that PA is consistent, conditional on the statement “PA is consistent” being meaningful. Note that you need a notion of “standard integers” to make sense of the latter statement too, because integers encode proofs.
You’ve made good points (and I’ve voted up your remark) but I’d like to note a few issues:
First we can prove the consistency of PA assuming certain other axiomatic systems. In particular, Gentzen’s theorem shows that consistency of PA can be proved if one accepts a system which is incomparable in strength to PA (in the sense that there are theorems in each system which are not theorems in the other).
This isn’t necessarily true unless one has a 1-1 correspondence between axiomatic systems and Turing machines, rather than just having axiomatic systems modelable as Turing machines. This is a minor detail that doesn’t impact your point in a substantial fashion.
Second, it isn’t clear how long we expect an average output of a Turing machine to be. I don’t know if anyone has done work on this, but it seems intuitively clear to me that if I pick a random Turing machine with n states, run it on the blank tape, I should expect with a high probability that the Turing machine will halt well before BB(n) or will never halt. Let’s make this claim more precise:
Definition: Let g(f)(n) be the (# of Turing machines with n states which when run on the blank tape either never halt or halt in fewer than f(n) steps) / (# of Turing machines with n states).
Question: Is there is a computable function f(n) such that g(f)(n) goes to 1 as n goes to infinity.
If such a function exists, then we should naively expect that random axiomatic systems will likely to either have an easy contradiction or be consistent. I’m not sure one way or another whether or not this function exists, but my naive guess is that it does.
Well, let’s say we want to know whether Turing machine T halts. Say T has k states. Can’t we use T’s program to find a family of programs T’ which are essentially the same as T except they consist of a common prefix followed by arbitrary suffix? Let k’ be the number of states in the ‘prefix’. The proportion of programs with ⇐ n states that belong to family T’ must be at least 2^(-k’-1) for all n >= k’.
Now let’s start running all programs and as we go along let’s keep track of whether, for each value of n, a proportion greater than or equal to 1 − 2^(-k’-1) of programs of length ⇐ n have halted in fewer than f(n) steps. Eventually this will be true for some n >= k’. At that point, we check to see whether one of the programs in T’ has halted. If not, then none of the programs in T’ can halt, and neither can T.
Therefore, such a function f could be used to solve the halting problem.
But strictly stronger in consistency strength, of course. (Consistency strength turns out to be more fundamental than logical strength simpliciter.)
Well, let’s say we want to know whether Turing machine T halts. Say T has k states. Can’t we use T’s program to find a family of programs T’ which are essentially the same as T except they consist of a common prefix followed by arbitrary suffix? Let k’ be the number of states in the ‘prefix’. The proportion of programs with less than n states that belong to family T’ must tend to 2^(-k’) as n goes to infinity. If we choose n such that g(f)(n) > 1 − 2^(-k’) then we just need to run all members of T’ for f(n) steps. If none of them have halted then T does not halt.
Therefore, such a function f could be used to solve the halting problem.
You must be aware that such halting probabilities are usually uncomputable, right? In any case you’re not going to be surprised that I wouldn’t find any information about this limit of ratios compelling, any more than you would buy my argument that 15 is not prime because most numbers are not prime.
Yes, but the existence of this function looks weaker than being able to compute Chaitin constants. Am I missing something here?
My prior that a random integer is prime is 1/log n . If you give me a large integer, the chance that it is prime is very tiny and that is a good argument for assuming that your random integer really isn’t prime. I’m not sure why you think that’s not a good argument, at least in the context when I can’t verify it (if say the number is too large).
But 1/log(n) takes a long time to get small, so that the argument “15 is not prime because most numbers are not prime” is not very good. It seems even more specious in settings where we have less of a handle on what’s going on at all, such as with halting probabilities.
Are you trying to make a probability argument like this because you scanned my argument as saying “PA is likely inconsistent because a random axiom system is likely inconsistent?” That’s not what I’m trying to say at all.
Sure, in the case of n=15, that’s a very weak argument. And just verifying is better, but the point is the overall thrust of the type of argument is valid Bayesian evidence.
No. I’m confused as to what I said that gives you that impression. If you had said that I’d actually disagree strongly (since what it is a reasonable distribution for “random axiomatic system” is not at all obvious). My primary issue again was with the Turing machine statement, where it isn’t at all obvious how frequently a random Turing machine behaves like a Busy Beaver.
I think you are being way to glib about the possibility of analyzing these foundational issues with probability. But let’s take for granted that it makes sense—the strength of this “Bayesian evidence” is
P(ratio goes to 1 | PA is inconsistent) / P(ratio goes to 1)
Now, I have no idea what the numerator and denominator actually mean in this instance, but informally speaking it seems to me that they are about the same size.
We can replace those “events” by predictions that I’m more comfortable evaluating using Bayes, e.g. P(JoshuaZ will find a proof that this ratio goes to 1 in the next few days) and P(JoshuaZ will find a proof that this ratio goes to 1 in the next few days | Voevodsky will find an inconsistency in PA in the next 10 years). Those are definitely about the same size.
Sure. There’s an obvious problem with what probabilities mean and how we would even discuss things like Turing machines if PA is inconsistent. One could talk about some model of Turing machines in Robinson arithmetic or the like.
But yes, I agree that using conventional probability in this way is fraught with difficulty.