Agree about the theorem. Not sure about the implication for explaining integers, since a human who has been assigned the task of decoding the string should have the same difficulty with that infinite input stream. His understanding of what an integer is doesn’t protect him. What seems to me to protect the human from being mentally trapped by an endless input is that his tolerance for the task is finite and eventually he’ll quit. If that’s all it is, then what you need to teach the computer is impatience.
It still feels very weird to me that we cannot explain our intuitive notion of “finiteness” to a machine. Maybe this shows that we don’t understand finiteness very well yet.
Of course we can explain it to a machine, just as we explain it to a person. By using second-order concepts (like “smallest set of thingies closed under zero and successor”).
Of course then we need to leave some aspects of those second-order concepts unexplained and ambiguous—for both machines and humans.
By ‘ambiguous’, I meant to suggest the existence of multiple non-isomorphic models.
The thing that puzzled cousin_it was that the axioms of first-order Peano arithmetic can be satisfied by non-standard models of arithmetic, and that there is no way to add additional first-order axioms to exclude these unwanted models.
The solution I proposed was to use a second-order axiom of induction—working with properties (i.e.sets) of numbers rather than the first-order predicates over numbers. This approach successfully excludes all the non-standard models of arithmetic, leaving only the desired standard model of cardinality aleph nought. But it extends the domain of discourse from simply numbers to both numbers and sets of numbers. And now we are left with the ambiguity of what model of sets of numbers we want to use.
It is mildly amusing that in the case of arithmetic, the unwanted non-standard models were all too big, but in the case of set theory, people seem to prefer to think of the large models as standard and dismiss Godel’s constructive set theory as an aberation.
Depends what you mean by ‘large’ I suppose. A non-well founded model of ZFC is ‘larger’ than the well-founded submodel it contains (in the sense that it properly contains its well-founded submodel), but it certainly isn’t “standard”.
By Gödel’s constructive set theory are you talking about set theory plus the axiom of constructibility (V=L)? V=L is hardly ‘dismissed as an aberration’ any more than the field axioms are an ‘aberration’ but just as there’s more scope for a ‘theory of rings’ than a ‘theory of fields’, so adding V=L as an axiom (and making a methodological decision to refrain from exploring universes where it fails) has the effect of truncating the hierarchy of large cardinals. Everything above zero-sharp becomes inconsistent.
Furthermore, the picture of L sitting inside V that emerges from the study of zero-sharp is so stark and clear that set theorists will never want to let it go. (“No one will drive us from the paradise which Jack Silver has created for us”.)
You’ve thought more about explaining things to machines than I have, but I’m not sure what you consider to count as having successfully explained a concept to a machine. For example, if you tell the machine that any string with both a beginning and an end is finite, then you’ve given the machine at least some sort of explanation of finiteness—one which I presume you consider to be inadequate. But when I try to imagine what you might find to be inadequate about it, my naive guess is that you’re thinking something like this: “even if you tell a machine that a finite string has an end, that won’t help the machine to decide whether a given input has reached its end”. In other words, the definition doesn’t give the machine the ability to identify infinite strings as such, or to identify finite strings as such in a time shorter than the length of the string itself. But my response to this is that our own human understanding of finiteness doesn’t give us that ability either. Just for starters, we don’t know whether the universe goes on forever in all directions or closes in on itself as a four-dimensional ball—and we might never know.
I’m not sure what you consider to count as having successfully explained a concept to a machine
I want it to be able to do some of the things that humans can do, e.g. arrive at the conclusion that the Paris-Harrington theorem is “true” even though it’s independent of PA. Humans manage that by having a vague unformalized concept of “standard integers” over which it is true, and then inventing axioms that fit their intuition. So I’m kicking around different ways to make the “standard integers” more clear.
Considering that Paris-Harrington is derivable from second order arithmetic, do you think any of the ideas from reverse mathematics might come into play? This paper might of interest if you aren’t very familiar with reverse mathematics, but would like to know more.
Also, it’s my intuition that a good deal of mathematics has more to say about human cognition than it does about anything else. That said, this seems like the sort of problem that should be tackled from a computational neuroscience angle first and foremost. I’m likely wrong about the ‘first and foremost’ part, but it seems like something on numerical cognition could help out.
Also, have you looked at this work? I don’t care for the whole ‘metaphor’ camp of thought (I view it as sort of a ripoff of the work on analogies), but there seem to be a few ideas that could be distilled here.
I’m familiar with reverse mathematics and it is indeed very relevant to what I want. Not so sure about Lakoff. If you see helpful ideas in his paper, could you try to distill them?
I could give it a shot. Technically I think they are Rafael Nunez’s ideas more than Lakoff’s (though they are framed in Lakoff’s metaphorical framework). The essential idea is that mathematics is built directly from certain types of embodied cognition, and that the feeling of intuitiveness for things like limits comes from the association of the concept with certain types of actions/movements. Nunez’s papers seem to have the central goal of framing as much mathematics as possible into an embodied cognition framework.
I’m really not sure how useful these ideas are, but I’ll give it another look through. I think that at most there might be the beginnings of some useful ideas, but I get the impression that Nunez’s mathematical intuition is not top notch, which makes his ideas difficult to evaluate when he tries to go further than calculus.
Fortunately, his stuff on arithmetic appears to be the most developed, so if there is something there I think I should be able to find it.
Having now looked up the PH theorem, I don’t understand what you mean. Do you disagree with any of the following?
Computers can prove Paris-Harrington just as easily as humans can. They can also prove the strong Ramsey result that is the subject of PH as easily as humans can.
Both humans and computers are incapable of proving the Ramsey result within Peano arithmetic. Both are capable of proving it in some stronger formal systems.
Both humans and computers can “see that the Ramsey result is true” in the sense that they can verify that a certain string of symbols is a valid proof in a formal system. They are both equally capable of verifying that the Ramsey result (which concerns finite sets of integers) is true by experiment. Neither a human nor a computer can “see that the Ramsey result is true” in any stronger sense.
Can I attempt a translation/expansion for Sewing-Machine of why you disagree with the last sentence?
It seems that there’s an intuition among humans that the Ramsey result is true, in the sense that PA + PH captures our intuition of the integers more closely than PA + ~PH given the second order result. What you want is a computer to be able to make that sort of intuitive reasoning or to make it more precise. Is that more or less the point?
We can all agree that human intuition is grand but not magical, I hope? Here is my point of view: you are having difficulty teaching a computer to “make that sort of intuitive reasoning” because that sort of reasoning is not quite right.
“That sort of reasoning” is a good heuristic for discovering true facts about the world (for instance, discovering interesting sequences of symbols that constitute a formal proof of the Paris-Harrington theorem), and to that extent it surely can be taught to a computer. But it does not itself express a true fact about the world, and because of that you are limited in your ability to make it part of the premises on which a computer operates (such as the limitation discussed in the OP).
I’m really at a loss as to why such a thing should be intuitive. The additional condition seems to me to be highly unnatural; Ramsey’s theorem is a purely graph-theoretic result, and this strengthened version involves comparing the number of vertices used to numbers that the vertices happen to correspond to, a comparison we would ordinarily consider meaningless.
If I’m following cousin it, the idea doesn’t have anything really to do with the statement about Ramsey numbers. As I understand it, if in some system that is only slightly stronger than PA we can show some statement S of the form A x in N, P(x), then we should believe that the correct models of PA are those which have S being true. Or to put it a different way, we should think PA + S will do a better job telling us about reality than PA + ~S would. I’m not sure this can be formalized beyond that. Presumably if it he had a way to formalize this, cousin it wouldn’t have an issue with it.
This reduces the problem of explaining “standard integers” to the problem of explaining “subsets”, which is not easier. I don’t think there’s any good first-order explanation of what a “subset” is. For example, your definition fails to capture “finiteness” in some weird models of ZFC. More generally, I think “subsets” are a much more subtle concept than “standard integers”. For example, a human can hold the position that all statements in the arithmetical hierarchy have well-defined (though unknown) truth values over the “standard integers”, and at the same time think that the continuum hypothesis is “neither true nor false” because it quantifies over all subsets of the same integers. (Scott Aaronson defends something like this position here.)
Yes, but Subsets(x,y) is a primitive relationship in ZFC. I don’t really know what cousin_it means by an explanation, but assuming it’s something like a first-order definition formula, nothing like that exists in ZFC that doesn’t subsume the concept in the first place.
No, it isn’t. The only primitive relations in ZFC are set membership and possibly equality (depending on how you prefer it). “x is a subset of y” is defined to mean “for all z, z in x implies z in y”.
Can I downvote myself? Somehow my mind switched “subset” and “membership”, and by the virtue of ZFC being a one-sorted theory, lo and behold, I wrote the above absurdity. Anyway, to rewrite the sentence and make it less wrong: subsets(x,y) is defined by the means of a first-order formula through the membership relation, which in a one-sorted theory already pertains the idea of ‘subsetting’. x E y --> {x} ⇐ y. So subsetting can be seen as a transfinite extension of the membership relation, and in ZFC we get no more clarity or computational intuition from the first than from the second.
It still feels very weird to me that we cannot explain our intuitive notion of “finiteness” to a machine. Maybe this shows that we don’t understand finiteness very well yet.
There’s actually a reason for that—it’s impossible to define “finite” in first-order logic. There’s no set of first-order statements that’s true in all finite models and false in all infinite models.
Well, now you are exhibiting the IMHO regrettable tendency that the more mathematical LWers exhibit of putting way too much focus on incompleteness uncomputability and other negative results that have negligible chance of actually manifesting unless you are specifically looking for pathological cases or negative results.
I use second-order logic all the time. Will work fine for this purpose.
I think I recall reading somewhere that you only need first-order logic to define Turing machines and computer programs in general, which seems to suggest that “not expressible in first order logic” means “uncomputable”… I could just be really confused about this though...
Anyway, for some reason or other I had the impression that “not expressible in first order logic” is a property that might have something in common with “hard to explain to a machine”.
If we could explain what “finite” meant, then we could explain what “infinite” meant. More and more I’m starting to believe “infinite” doesn’t mean anything.
How about ‘not finite’? Even better, ‘non-halting’ or maybe even...‘not step-wise completable/obtainable’? Something is ’in-finite’ly far away if there is no set of steps that I can execute to obtain it within some determinate amount of time.
You’re repeating my point. It’s no harder to explain what “finite” means than it is to explain what “not finite” means. You take this to mean that the concept of “not finite” is easy, and I take it to mean that concept of “finite” is hard. Cousin it’s recent experience lends credence to my point of view.
For any N, it’s easy to explain to a computer what “cardinality smaller than N” means. It’s hard to explain to a computer what “exists N such that cardinality is smaller than N” means, and this is the source of cousin it’s trouble. The next logical step is to realize for a cognitive illusion the idea that humans have a special insight into what this means, that computers can’t have. For some reason most people aren’t willing to take that step.
If it’s just a cognitive illusion, why is it so hard to find contradictions in axiom systems about integers that were generated by humans intuitively, like PA?
We seem to have a superpower for inventing apparently consistent axiom systems that make “intuitive sense” to us. The fact that we have this superpower is even machine-checkable to a certain degree (say, generate all proofs of up to a million symbols and look for contradictions), but the superpower itself resists attempts at formalization.
We work in axiom systems that have not been proven inconsistent because in the past we have reacted to inconsistencies (such as Russel’s paradox) by abandoning the axioms. I wouldn’t call this a superpower but a bias.
Russel’s paradox is a cliche but it’s really illustrative. The principle of noncontradiction says that, since we have arrived at a false statement (the barber shaves himself and doesn’t shave himself) some of our premises must be wrong. Apparently the incorrect premise is: given a property P, there is a set of elements that satisfy P. What could it possibly mean that this premise is false? Evidently it means that the everyday meaning of words like “property,” “set,” and “element” is not clear enough to avoid contradictions. Why are you so sure that the everyday meaning of words like “number,” “successor,” and “least element” are so much clearer?
Here’s another reason to be unimpressed by the fact that no contradictions in PA have been found. The length of the shortest proof of falsum in a formal system has the same property as the busy beaver numbers: they cannot be bounded above by any computable function. i.e. there is an inconsistent formal system with fewer than 100 axioms in which all theorems with proofs of length smaller than BB(100) are consistent with each other. Why couldn’t PA be such a system?
there is an inconsistent formal system with fewer than 100 axioms in which all theorems with proofs of length smaller than BB(100) are consistent with each other. Why couldn’t PA be such a system?
I suspect that for most “naive” ways of constructing such a system X, the very long proof of inconsistency for X should reduce to a very short proof of inconsistency for X+Con(X), because the latter system should be enough to capture most “informal” reasoning that you used to construct X in the first place. The existence of such a proof wouldn’t imply the inconsistency of X directly (there are consistent systems X such that X+Con(X) is inconsistent, e.g. X=Y+not(Con(Y)) where Y is consistent), but if PA+Con(PA) were ever shown to be inconsistent, that would be highly suggestive and would cause me to abandon my intuitions expressed above.
But as far as we know now, PA+Con(PA) looks just as consistent as PA itself. Moreover, I think you can add a countable-ordinal pile of iterated Con’s on top and still get a system that’s weaker than ZFC. (I’m not 100% confident of that, would be nice if someone corrected me!)
So I’m pretty confident that PA is consistent, conditional on the statement “PA is consistent” being meaningful. Note that you need a notion of “standard integers” to make sense of the latter statement too, because integers encode proofs.
You’ve made good points (and I’ve voted up your remark) but I’d like to note a few issues:
First we can prove the consistency of PA assuming certain other axiomatic systems. In particular, Gentzen’s theorem shows that consistency of PA can be proved if one accepts a system which is incomparable in strength to PA (in the sense that there are theorems in each system which are not theorems in the other).
they cannot be bounded above by any computable function. i.e. there is an inconsistent formal system with fewer than 100 axioms in which all theorems with proofs of length smaller than BB(100) are consistent with each other.
This isn’t necessarily true unless one has a 1-1 correspondence between axiomatic systems and Turing machines, rather than just having axiomatic systems modelable as Turing machines. This is a minor detail that doesn’t impact your point in a substantial fashion.
Second, it isn’t clear how long we expect an average output of a Turing machine to be. I don’t know if anyone has done work on this, but it seems intuitively clear to me that if I pick a random Turing machine with n states, run it on the blank tape, I should expect with a high probability that the Turing machine will halt well before BB(n) or will never halt. Let’s make this claim more precise:
Definition: Let g(f)(n) be the (# of Turing machines with n states which when run on the blank tape either never halt or halt in fewer than f(n) steps) / (# of Turing machines with n states).
Question: Is there is a computable function f(n) such that g(f)(n) goes to 1 as n goes to infinity.
If such a function exists, then we should naively expect that random axiomatic systems will likely to either have an easy contradiction or be consistent. I’m not sure one way or another whether or not this function exists, but my naive guess is that it does.
Question: Is there is a computable function f(n) such that g(f)(n) goes to 1 as n goes to infinity.
Well, let’s say we want to know whether Turing machine T halts. Say T has k states. Can’t we use T’s program to find a family of programs T’ which are essentially the same as T except they consist of a common prefix followed by arbitrary suffix? Let k’ be the number of states in the ‘prefix’. The proportion of programs with ⇐ n states that belong to family T’ must be at least 2^(-k’-1) for all n >= k’.
Now let’s start running all programs and as we go along let’s keep track of whether, for each value of n, a proportion greater than or equal to 1 − 2^(-k’-1) of programs of length ⇐ n have halted in fewer than f(n) steps. Eventually this will be true for some n >= k’. At that point, we check to see whether one of the programs in T’ has halted. If not, then none of the programs in T’ can halt, and neither can T.
Therefore, such a function f could be used to solve the halting problem.
But strictly stronger in consistency strength, of course. (Consistency strength turns out to be more fundamental than logical strength simpliciter.)
Question: Is there is a computable function f(n) such that g(f)(n) goes to 1 as n goes to infinity.
Well, let’s say we want to know whether Turing machine T halts. Say T has k states. Can’t we use T’s program to find a family of programs T’ which are essentially the same as T except they consist of a common prefix followed by arbitrary suffix? Let k’ be the number of states in the ‘prefix’. The proportion of programs with less than n states that belong to family T’ must tend to 2^(-k’) as n goes to infinity. If we choose n such that g(f)(n) > 1 − 2^(-k’) then we just need to run all members of T’ for f(n) steps. If none of them have halted then T does not halt.
Therefore, such a function f could be used to solve the halting problem.
If such a function exists, then we should naively expect that random axiomatic systems will likely to either have an easy contradiction or be consistent. I’m not sure one way or another whether or not this function exists, but my naive guess is that it does.
You must be aware that such halting probabilities are usually uncomputable, right? In any case you’re not going to be surprised that I wouldn’t find any information about this limit of ratios compelling, any more than you would buy my argument that 15 is not prime because most numbers are not prime.
You must be aware that such halting probabilities are usually uncomputable, right?
Yes, but the existence of this function looks weaker than being able to compute Chaitin constants. Am I missing something here?
In any case you’re not going to be surprised that I wouldn’t find any information about this limit of ratios compelling, any more than you would buy my argument that 15 is not prime because most numbers are not prime.
My prior that a random integer is prime is 1/log n . If you give me a large integer, the chance that it is prime is very tiny and that is a good argument for assuming that your random integer really isn’t prime. I’m not sure why you think that’s not a good argument, at least in the context when I can’t verify it (if say the number is too large).
But 1/log(n) takes a long time to get small, so that the argument “15 is not prime because most numbers are not prime” is not very good. It seems even more specious in settings where we have less of a handle on what’s going on at all, such as with halting probabilities.
Are you trying to make a probability argument like this because you scanned my argument as saying “PA is likely inconsistent because a random axiom system is likely inconsistent?” That’s not what I’m trying to say at all.
But 1/log(n) takes a long time to get small, so that the argument “15 is not prime because most numbers are not prime” is not very good. It seems even more specious in settings where we have less of a handle on what’s going on at all, such as with halting probabilities
Sure, in the case of n=15, that’s a very weak argument. And just verifying is better, but the point is the overall thrust of the type of argument is valid Bayesian evidence.
Are you trying to make a probability argument like this because you scanned my argument as saying “PA is likely inconsistent because a random axiom system is likely inconsistent?” That’s not what I’m trying to say at all.
No. I’m confused as to what I said that gives you that impression. If you had said that I’d actually disagree strongly (since what it is a reasonable distribution for “random axiomatic system” is not at all obvious). My primary issue again was with the Turing machine statement, where it isn’t at all obvious how frequently a random Turing machine behaves like a Busy Beaver.
And just verifying is better, but the point is the overall thrust of the type of argument is valid Bayesian evidence.
I think you are being way to glib about the possibility of analyzing these foundational issues with probability. But let’s take for granted that it makes sense—the strength of this “Bayesian evidence” is
P(ratio goes to 1 | PA is inconsistent) / P(ratio goes to 1)
Now, I have no idea what the numerator and denominator actually mean in this instance, but informally speaking it seems to me that they are about the same size.
We can replace those “events” by predictions that I’m more comfortable evaluating using Bayes, e.g. P(JoshuaZ will find a proof that this ratio goes to 1 in the next few days) and P(JoshuaZ will find a proof that this ratio goes to 1 in the next few days | Voevodsky will find an inconsistency in PA in the next 10 years). Those are definitely about the same size.
Sure. There’s an obvious problem with what probabilities mean and how we would even discuss things like Turing machines if PA is inconsistent. One could talk about some model of Turing machines in Robinson arithmetic or the like.
But yes, I agree that using conventional probability in this way is fraught with difficulty.
Agree about the theorem. Not sure about the implication for explaining integers, since a human who has been assigned the task of decoding the string should have the same difficulty with that infinite input stream. His understanding of what an integer is doesn’t protect him. What seems to me to protect the human from being mentally trapped by an endless input is that his tolerance for the task is finite and eventually he’ll quit. If that’s all it is, then what you need to teach the computer is impatience.
It still feels very weird to me that we cannot explain our intuitive notion of “finiteness” to a machine. Maybe this shows that we don’t understand finiteness very well yet.
Of course we can explain it to a machine, just as we explain it to a person. By using second-order concepts (like “smallest set of thingies closed under zero and successor”). Of course then we need to leave some aspects of those second-order concepts unexplained and ambiguous—for both machines and humans.
I don’t understand what you’re referring to in your second sentence. Can you elaborate? What sorts of things need to be ambiguous?
By ‘ambiguous’, I meant to suggest the existence of multiple non-isomorphic models.
The thing that puzzled cousin_it was that the axioms of first-order Peano arithmetic can be satisfied by non-standard models of arithmetic, and that there is no way to add additional first-order axioms to exclude these unwanted models.
The solution I proposed was to use a second-order axiom of induction—working with properties (i.e.sets) of numbers rather than the first-order predicates over numbers. This approach successfully excludes all the non-standard models of arithmetic, leaving only the desired standard model of cardinality aleph nought. But it extends the domain of discourse from simply numbers to both numbers and sets of numbers. And now we are left with the ambiguity of what model of sets of numbers we want to use.
It is mildly amusing that in the case of arithmetic, the unwanted non-standard models were all too big, but in the case of set theory, people seem to prefer to think of the large models as standard and dismiss Godel’s constructive set theory as an aberation.
Depends what you mean by ‘large’ I suppose. A non-well founded model of ZFC is ‘larger’ than the well-founded submodel it contains (in the sense that it properly contains its well-founded submodel), but it certainly isn’t “standard”.
By Gödel’s constructive set theory are you talking about set theory plus the axiom of constructibility (V=L)? V=L is hardly ‘dismissed as an aberration’ any more than the field axioms are an ‘aberration’ but just as there’s more scope for a ‘theory of rings’ than a ‘theory of fields’, so adding V=L as an axiom (and making a methodological decision to refrain from exploring universes where it fails) has the effect of truncating the hierarchy of large cardinals. Everything above zero-sharp becomes inconsistent.
Furthermore, the picture of L sitting inside V that emerges from the study of zero-sharp is so stark and clear that set theorists will never want to let it go. (“No one will drive us from the paradise which Jack Silver has created for us”.)
Thank you for the clarification.
You’ve thought more about explaining things to machines than I have, but I’m not sure what you consider to count as having successfully explained a concept to a machine. For example, if you tell the machine that any string with both a beginning and an end is finite, then you’ve given the machine at least some sort of explanation of finiteness—one which I presume you consider to be inadequate. But when I try to imagine what you might find to be inadequate about it, my naive guess is that you’re thinking something like this: “even if you tell a machine that a finite string has an end, that won’t help the machine to decide whether a given input has reached its end”. In other words, the definition doesn’t give the machine the ability to identify infinite strings as such, or to identify finite strings as such in a time shorter than the length of the string itself. But my response to this is that our own human understanding of finiteness doesn’t give us that ability either. Just for starters, we don’t know whether the universe goes on forever in all directions or closes in on itself as a four-dimensional ball—and we might never know.
I want it to be able to do some of the things that humans can do, e.g. arrive at the conclusion that the Paris-Harrington theorem is “true” even though it’s independent of PA. Humans manage that by having a vague unformalized concept of “standard integers” over which it is true, and then inventing axioms that fit their intuition. So I’m kicking around different ways to make the “standard integers” more clear.
Considering that Paris-Harrington is derivable from second order arithmetic, do you think any of the ideas from reverse mathematics might come into play? This paper might of interest if you aren’t very familiar with reverse mathematics, but would like to know more.
Also, it’s my intuition that a good deal of mathematics has more to say about human cognition than it does about anything else. That said, this seems like the sort of problem that should be tackled from a computational neuroscience angle first and foremost. I’m likely wrong about the ‘first and foremost’ part, but it seems like something on numerical cognition could help out.
Also, have you looked at this work? I don’t care for the whole ‘metaphor’ camp of thought (I view it as sort of a ripoff of the work on analogies), but there seem to be a few ideas that could be distilled here.
I’m familiar with reverse mathematics and it is indeed very relevant to what I want. Not so sure about Lakoff. If you see helpful ideas in his paper, could you try to distill them?
I could give it a shot. Technically I think they are Rafael Nunez’s ideas more than Lakoff’s (though they are framed in Lakoff’s metaphorical framework). The essential idea is that mathematics is built directly from certain types of embodied cognition, and that the feeling of intuitiveness for things like limits comes from the association of the concept with certain types of actions/movements. Nunez’s papers seem to have the central goal of framing as much mathematics as possible into an embodied cognition framework.
I’m really not sure how useful these ideas are, but I’ll give it another look through. I think that at most there might be the beginnings of some useful ideas, but I get the impression that Nunez’s mathematical intuition is not top notch, which makes his ideas difficult to evaluate when he tries to go further than calculus.
Fortunately, his stuff on arithmetic appears to be the most developed, so if there is something there I think I should be able to find it.
Having now looked up the PH theorem, I don’t understand what you mean. Do you disagree with any of the following?
Computers can prove Paris-Harrington just as easily as humans can. They can also prove the strong Ramsey result that is the subject of PH as easily as humans can.
Both humans and computers are incapable of proving the Ramsey result within Peano arithmetic. Both are capable of proving it in some stronger formal systems.
Both humans and computers can “see that the Ramsey result is true” in the sense that they can verify that a certain string of symbols is a valid proof in a formal system. They are both equally capable of verifying that the Ramsey result (which concerns finite sets of integers) is true by experiment. Neither a human nor a computer can “see that the Ramsey result is true” in any stronger sense.
I agree with everything in your comment except the last sentence. Sorry for being cryptic, I think this still gets the point across :-)
Can I attempt a translation/expansion for Sewing-Machine of why you disagree with the last sentence?
It seems that there’s an intuition among humans that the Ramsey result is true, in the sense that PA + PH captures our intuition of the integers more closely than PA + ~PH given the second order result. What you want is a computer to be able to make that sort of intuitive reasoning or to make it more precise. Is that more or less the point?
We can all agree that human intuition is grand but not magical, I hope? Here is my point of view: you are having difficulty teaching a computer to “make that sort of intuitive reasoning” because that sort of reasoning is not quite right.
“That sort of reasoning” is a good heuristic for discovering true facts about the world (for instance, discovering interesting sequences of symbols that constitute a formal proof of the Paris-Harrington theorem), and to that extent it surely can be taught to a computer. But it does not itself express a true fact about the world, and because of that you are limited in your ability to make it part of the premises on which a computer operates (such as the limitation discussed in the OP).
So I’ve been thinking lately, anyway.
I’m really at a loss as to why such a thing should be intuitive. The additional condition seems to me to be highly unnatural; Ramsey’s theorem is a purely graph-theoretic result, and this strengthened version involves comparing the number of vertices used to numbers that the vertices happen to correspond to, a comparison we would ordinarily consider meaningless.
If I’m following cousin it, the idea doesn’t have anything really to do with the statement about Ramsey numbers. As I understand it, if in some system that is only slightly stronger than PA we can show some statement S of the form A x in N, P(x), then we should believe that the correct models of PA are those which have S being true. Or to put it a different way, we should think PA + S will do a better job telling us about reality than PA + ~S would. I’m not sure this can be formalized beyond that. Presumably if it he had a way to formalize this, cousin it wouldn’t have an issue with it.
Shades of Penrosian nonsense.
A finite number is one that cannot be the cardinality of a set that has a subset with an equal cardinality.
This reduces the problem of explaining “standard integers” to the problem of explaining “subsets”, which is not easier. I don’t think there’s any good first-order explanation of what a “subset” is. For example, your definition fails to capture “finiteness” in some weird models of ZFC. More generally, I think “subsets” are a much more subtle concept than “standard integers”. For example, a human can hold the position that all statements in the arithmetical hierarchy have well-defined (though unknown) truth values over the “standard integers”, and at the same time think that the continuum hypothesis is “neither true nor false” because it quantifies over all subsets of the same integers. (Scott Aaronson defends something like this position here.)
Well, ZFC is a first-order theory...
Yes, but Subsets(x,y) is a primitive relationship in ZFC. I don’t really know what cousin_it means by an explanation, but assuming it’s something like a first-order definition formula, nothing like that exists in ZFC that doesn’t subsume the concept in the first place.
No, it isn’t. The only primitive relations in ZFC are set membership and possibly equality (depending on how you prefer it). “x is a subset of y” is defined to mean “for all z, z in x implies z in y”.
Can I downvote myself? Somehow my mind switched “subset” and “membership”, and by the virtue of ZFC being a one-sorted theory, lo and behold, I wrote the above absurdity. Anyway, to rewrite the sentence and make it less wrong: subsets(x,y) is defined by the means of a first-order formula through the membership relation, which in a one-sorted theory already pertains the idea of ‘subsetting’. x E y --> {x} ⇐ y. So subsetting can be seen as a transfinite extension of the membership relation, and in ZFC we get no more clarity or computational intuition from the first than from the second.
Set theory is not easier than arithmetic! Zero is a finite number, and N+1 is a finite number if and only if N is.
Yes, that is a much better definition. I don’t know why this one occurred to me first.
Sewing Machine’s previous comment isn’t really a definition, but it leads to the following:
“n is a finite ordinal if and only if for all properties P such that P(0) and P(k) implies P(k+1), we have P(n).”
In other words, the finite numbers are “the smallest” collection of objects containing 0 and closed under successorship.
(If “properties” means predicates then our definition uses second-order logic. Or it may mean ‘sets’ in which case we’re using set theory.)
Though with the standard definitions, that requires some form of choice.
There’s actually a reason for that—it’s impossible to define “finite” in first-order logic. There’s no set of first-order statements that’s true in all finite models and false in all infinite models.
I know that, of course. Inserting obligatory reference to Lowenheim-Skolem for future generations doing Ctrl+F.
How’s that relevant?
For present purposes, why should we care whether the logic we use is first-order?
In first-order logic, if a statement is logically valid—is true in all models—then there exists a (finite) proof of that statement. Second-order logic, however, is incomplete; there is no proof system that can prove all logically valid second-order statements.
So if you can reduce something to first-order logic, that’s a lot better than reducing it to second-order logic.
Well, now you are exhibiting the IMHO regrettable tendency that the more mathematical LWers exhibit of putting way too much focus on incompleteness uncomputability and other negative results that have negligible chance of actually manifesting unless you are specifically looking for pathological cases or negative results.
I use second-order logic all the time. Will work fine for this purpose.
Yeah, you’re probably right...
I still do not see how first-order logic relates in any way to cousin_it’s statement in grandparent of grandparent.
Just because second-order logic is incomplete does not mean we must restrict ourselves to first-order logic.
I think I recall reading somewhere that you only need first-order logic to define Turing machines and computer programs in general, which seems to suggest that “not expressible in first order logic” means “uncomputable”… I could just be really confused about this though...
Anyway, for some reason or other I had the impression that “not expressible in first order logic” is a property that might have something in common with “hard to explain to a machine”.
ADDED. relates → supports or helps to illuminate
If we could explain what “finite” meant, then we could explain what “infinite” meant. More and more I’m starting to believe “infinite” doesn’t mean anything.
How about ‘not finite’? Even better, ‘non-halting’ or maybe even...‘not step-wise completable/obtainable’? Something is ’in-finite’ly far away if there is no set of steps that I can execute to obtain it within some determinate amount of time.
You’re repeating my point. It’s no harder to explain what “finite” means than it is to explain what “not finite” means. You take this to mean that the concept of “not finite” is easy, and I take it to mean that concept of “finite” is hard. Cousin it’s recent experience lends credence to my point of view.
For any N, it’s easy to explain to a computer what “cardinality smaller than N” means. It’s hard to explain to a computer what “exists N such that cardinality is smaller than N” means, and this is the source of cousin it’s trouble. The next logical step is to realize for a cognitive illusion the idea that humans have a special insight into what this means, that computers can’t have. For some reason most people aren’t willing to take that step.
If it’s just a cognitive illusion, why is it so hard to find contradictions in axiom systems about integers that were generated by humans intuitively, like PA?
We seem to have a superpower for inventing apparently consistent axiom systems that make “intuitive sense” to us. The fact that we have this superpower is even machine-checkable to a certain degree (say, generate all proofs of up to a million symbols and look for contradictions), but the superpower itself resists attempts at formalization.
We work in axiom systems that have not been proven inconsistent because in the past we have reacted to inconsistencies (such as Russel’s paradox) by abandoning the axioms. I wouldn’t call this a superpower but a bias.
Russel’s paradox is a cliche but it’s really illustrative. The principle of noncontradiction says that, since we have arrived at a false statement (the barber shaves himself and doesn’t shave himself) some of our premises must be wrong. Apparently the incorrect premise is: given a property P, there is a set of elements that satisfy P. What could it possibly mean that this premise is false? Evidently it means that the everyday meaning of words like “property,” “set,” and “element” is not clear enough to avoid contradictions. Why are you so sure that the everyday meaning of words like “number,” “successor,” and “least element” are so much clearer?
Here’s another reason to be unimpressed by the fact that no contradictions in PA have been found. The length of the shortest proof of falsum in a formal system has the same property as the busy beaver numbers: they cannot be bounded above by any computable function. i.e. there is an inconsistent formal system with fewer than 100 axioms in which all theorems with proofs of length smaller than BB(100) are consistent with each other. Why couldn’t PA be such a system?
I suspect that for most “naive” ways of constructing such a system X, the very long proof of inconsistency for X should reduce to a very short proof of inconsistency for X+Con(X), because the latter system should be enough to capture most “informal” reasoning that you used to construct X in the first place. The existence of such a proof wouldn’t imply the inconsistency of X directly (there are consistent systems X such that X+Con(X) is inconsistent, e.g. X=Y+not(Con(Y)) where Y is consistent), but if PA+Con(PA) were ever shown to be inconsistent, that would be highly suggestive and would cause me to abandon my intuitions expressed above.
But as far as we know now, PA+Con(PA) looks just as consistent as PA itself. Moreover, I think you can add a countable-ordinal pile of iterated Con’s on top and still get a system that’s weaker than ZFC. (I’m not 100% confident of that, would be nice if someone corrected me!)
So I’m pretty confident that PA is consistent, conditional on the statement “PA is consistent” being meaningful. Note that you need a notion of “standard integers” to make sense of the latter statement too, because integers encode proofs.
You’ve made good points (and I’ve voted up your remark) but I’d like to note a few issues:
First we can prove the consistency of PA assuming certain other axiomatic systems. In particular, Gentzen’s theorem shows that consistency of PA can be proved if one accepts a system which is incomparable in strength to PA (in the sense that there are theorems in each system which are not theorems in the other).
This isn’t necessarily true unless one has a 1-1 correspondence between axiomatic systems and Turing machines, rather than just having axiomatic systems modelable as Turing machines. This is a minor detail that doesn’t impact your point in a substantial fashion.
Second, it isn’t clear how long we expect an average output of a Turing machine to be. I don’t know if anyone has done work on this, but it seems intuitively clear to me that if I pick a random Turing machine with n states, run it on the blank tape, I should expect with a high probability that the Turing machine will halt well before BB(n) or will never halt. Let’s make this claim more precise:
Definition: Let g(f)(n) be the (# of Turing machines with n states which when run on the blank tape either never halt or halt in fewer than f(n) steps) / (# of Turing machines with n states).
Question: Is there is a computable function f(n) such that g(f)(n) goes to 1 as n goes to infinity.
If such a function exists, then we should naively expect that random axiomatic systems will likely to either have an easy contradiction or be consistent. I’m not sure one way or another whether or not this function exists, but my naive guess is that it does.
Well, let’s say we want to know whether Turing machine T halts. Say T has k states. Can’t we use T’s program to find a family of programs T’ which are essentially the same as T except they consist of a common prefix followed by arbitrary suffix? Let k’ be the number of states in the ‘prefix’. The proportion of programs with ⇐ n states that belong to family T’ must be at least 2^(-k’-1) for all n >= k’.
Now let’s start running all programs and as we go along let’s keep track of whether, for each value of n, a proportion greater than or equal to 1 − 2^(-k’-1) of programs of length ⇐ n have halted in fewer than f(n) steps. Eventually this will be true for some n >= k’. At that point, we check to see whether one of the programs in T’ has halted. If not, then none of the programs in T’ can halt, and neither can T.
Therefore, such a function f could be used to solve the halting problem.
But strictly stronger in consistency strength, of course. (Consistency strength turns out to be more fundamental than logical strength simpliciter.)
Well, let’s say we want to know whether Turing machine T halts. Say T has k states. Can’t we use T’s program to find a family of programs T’ which are essentially the same as T except they consist of a common prefix followed by arbitrary suffix? Let k’ be the number of states in the ‘prefix’. The proportion of programs with less than n states that belong to family T’ must tend to 2^(-k’) as n goes to infinity. If we choose n such that g(f)(n) > 1 − 2^(-k’) then we just need to run all members of T’ for f(n) steps. If none of them have halted then T does not halt.
Therefore, such a function f could be used to solve the halting problem.
You must be aware that such halting probabilities are usually uncomputable, right? In any case you’re not going to be surprised that I wouldn’t find any information about this limit of ratios compelling, any more than you would buy my argument that 15 is not prime because most numbers are not prime.
Yes, but the existence of this function looks weaker than being able to compute Chaitin constants. Am I missing something here?
My prior that a random integer is prime is 1/log n . If you give me a large integer, the chance that it is prime is very tiny and that is a good argument for assuming that your random integer really isn’t prime. I’m not sure why you think that’s not a good argument, at least in the context when I can’t verify it (if say the number is too large).
But 1/log(n) takes a long time to get small, so that the argument “15 is not prime because most numbers are not prime” is not very good. It seems even more specious in settings where we have less of a handle on what’s going on at all, such as with halting probabilities.
Are you trying to make a probability argument like this because you scanned my argument as saying “PA is likely inconsistent because a random axiom system is likely inconsistent?” That’s not what I’m trying to say at all.
Sure, in the case of n=15, that’s a very weak argument. And just verifying is better, but the point is the overall thrust of the type of argument is valid Bayesian evidence.
No. I’m confused as to what I said that gives you that impression. If you had said that I’d actually disagree strongly (since what it is a reasonable distribution for “random axiomatic system” is not at all obvious). My primary issue again was with the Turing machine statement, where it isn’t at all obvious how frequently a random Turing machine behaves like a Busy Beaver.
I think you are being way to glib about the possibility of analyzing these foundational issues with probability. But let’s take for granted that it makes sense—the strength of this “Bayesian evidence” is
P(ratio goes to 1 | PA is inconsistent) / P(ratio goes to 1)
Now, I have no idea what the numerator and denominator actually mean in this instance, but informally speaking it seems to me that they are about the same size.
We can replace those “events” by predictions that I’m more comfortable evaluating using Bayes, e.g. P(JoshuaZ will find a proof that this ratio goes to 1 in the next few days) and P(JoshuaZ will find a proof that this ratio goes to 1 in the next few days | Voevodsky will find an inconsistency in PA in the next 10 years). Those are definitely about the same size.
Sure. There’s an obvious problem with what probabilities mean and how we would even discuss things like Turing machines if PA is inconsistent. One could talk about some model of Turing machines in Robinson arithmetic or the like.
But yes, I agree that using conventional probability in this way is fraught with difficulty.