Standard and Nonstandard Numbers
Followup to: Logical Pinpointing
“Oh! Hello. Back again?”
Yes, I’ve got another question. Earlier you said that you had to use second-order logic to define the numbers. But I’m pretty sure I’ve heard about something called ‘first-order Peano arithmetic’ which is also supposed to define the natural numbers. Going by the name, I doubt it has any ‘second-order’ axioms. Honestly, I’m not sure I understand this second-order business at all.
“Well, let’s start by examining the following model:”
“This model has three properties that we would expect to be true of the standard numbers - ‘Every number has a successor’, ‘If two numbers have the same successor they are the same number’, and ‘0 is the only number which is not the successor of any number’. All three of these statements are true in this model, so in that sense it’s quite numberlike—”
And yet this model clearly is not the numbers we are looking for, because it’s got all these mysterious extra numbers like C and −2*. That C thing even loops around, which I certainly wouldn’t expect any number to do. And then there’s that infinite-in-both-directions chain which isn’t corrected to anything else.
“Right, so, the difference between first-order logic and second-order logic is this: In first-order logic, we can get rid of the ABC—make a statement which rules out any model that has a loop of numbers like that. But we can’t get rid of the infinite chain underneath it. In second-order logic we can get rid of the extra chain.”
I would ask you to explain why that was true, but at this point I don’t even know what second-order logic is.
“Bear with me. First, consider that the following formula detects 2-ness:”
x + 2 = x * 2
In other words, that’s a formula which is true when x is equal to 2, and false everywhere else, so it singles out 2?
“Exactly. And this is a formula which detects odd numbers:”
∃y: x=(2*y)+1
Um… okay. That formula says, ‘There exists a y, such that x equals 2 times y plus one.’ And that’s true when x is 1, because 0 is a number, and 1=(2*0)+1. And it’s true when x is 9, because there exists a number 4 such that 9=(2*4)+1… right. The formula is true at all odd numbers, and only odd numbers.
“Indeed. Now suppose we had some way to detect the existence of that ABC-loop in the model—a formula which was true at the ABC-loop and false everywhere else. Then I could adapt the negation of this statement to say ‘No objects like this are allowed to exist’, and add that as an axiom alongside ‘Every number has a successor’ and so on. Then I’d have narrowed down the possible set of models to get rid of models that have an extra ABC-loop in them.”
Um… can I rule out the ABC-loop by saying ¬∃x:(x=A)?
“Er, only if you’ve told me what A is in the first place, and in a logic which has ruled out all models with loops in them, you shouldn’t be able to point to a specific object that doesn’t exist—”
Right. Okay… so the idea is to rule out loops of successors… hm. In the numbers 0, 1, 2, 3..., the number 0 isn’t the successor of any number. If I just took a group of numbers starting at 1, like {1, 2, 3, …}, then 1 wouldn’t be the successor of any number inside that group. But in A, B, C, the number A is the successor of C, which is the successor of B, which is the successor of A. So how about if I say: ‘There’s no group of numbers G such that for any number x in G, x is the successor of some other number y in G.’
“Ah! Very clever. But it so happens that you just used second-order logic, because you talked about groups or collections of entities, whereas first-order logic only talks about individual entities. Like, suppose we had a logic talking about kittens and whether they’re innocent. Here’s a model of a universe containing exactly three distinct kittens who are all innocent:”
Er, what are those ‘property’ thingies?
“They’re all possible collections of kittens. They’re labeled properties because every collection of kittens corresponds to a property that some kittens have and some kittens don’t. For example, the collection on the top right, which contains only the grey kitten, corresponds to a predicate which is true at the grey kitten and false everywhere else, or to a property which the grey kitten has which no other kitten has. Actually, for now let’s just pretend that ‘property’ just says ‘collection’.”
Okay. I understand the concept of a collection of kittens.
“In first-order logic, we can talk about individual kittens, and how they relate to other individual kittens, and whether or not any kitten bearing a certain relation exists or doesn’t exist. For example, we can talk about how the grey kitten adores the brown kitten. In second-order logic, we can talk about collections of kittens, and whether or not those collections exist. So in first-order logic, I can say, ‘There exists a kitten which is innocent’, or ‘For every individual kitten, that kitten is innocent’, or ‘For every individual kitten, there exists another individual kitten which adores the first kitten.’ But it requires second-order logic to make statements about collections of kittens, like, ‘There exists no collection of kittens such that every kitten in it is adored by some other kitten inside the collection.’”
I see. So when I tried to say that you couldn’t have any group of numbers, such that every number in the group was a successor of some other number in the group...
″...you quantified over the existence or nonexistence of collections of numbers, which means you were using second-order logic. However, in this particular case, it’s easily possible to rule out the ABC-loop of numbers using only first-order logic. Consider the formula:”
x=SSSx
x plus 3 is equal to itself?
“Right. That’s a first-order formula, since it doesn’t talk about collections. And that formula is false at 0, 1, 2, 3… but true at A, B, and C.”
What does the ‘+’ mean?
“Er, by ‘+’ I was trying to say, ‘this formula works out to True’ and similarly ‘¬’ was supposed to mean the formula works out to False. The general idea is that we now have a formula for detecting 3-loops, and distinguishing them from standard numbers like 0, 1, 2 and so on.”
I see. So by adding the new axiom, ¬∃x:x=SSSx, we could rule out all the models containing A, B, and C or any other 3-loop of nonstandard numbers.
“Right.”
But this seems like a rather arbitrary sort of axiom to add to a fundamental theory of arithmetic. I mean, I’ve never seen any attempt to describe the numbers which says, ‘No number is equal to itself plus 3’ as a basic premise. It seems like it should be a theorem, not an axiom.
“That’s because it’s brought in using a more general rule. In particular, first-order arithmetic has an infinite axiom schema—an infinite but computable scheme of axioms. Each axiom in the schema says, for a different first-order formula Φ(x) - pronounced ‘phi of x’ - that:”
If Φ is true at 0, i.e: Φ(0)
And if Φ is true of the successor of any number where it’s true, i.e: ∀x: Φ(x)→Φ(Sx)
Then Φ is true of all numbers: ∀n: Φ(n)
(Φ(0) ∧ (∀x: Φ(x) → Φ(Sx))) → (∀n: Φ(n))
“In other words, every formula which is true at 0, and which is true of the successor of any number of which it is true, is true everywhere. This is the induction schema of first-order arithmetic. As a special case we have the particular inductive axiom:”
(0≠SSS0 ∧ (∀x: (x≠SSSx) → (Sx≠SSSSx)) → (∀n: n≠SSSn)
But that doesn’t say that for all n, n≠n+3. It gives some premises from which that conclusion would follow, but we don’t know the premises.
“Ah, however, we can prove those premises using the other axioms of arithmetic, and hence prove the conclusion. The formula (SSSx=x) is false at 0, because 0 is not the successor of any number, including SS0. Similarly, consider the formula SSSSx=Sx, which we can rearrange as S(SSSx)=S(x). If two numbers have the same successor they are the same number, so SSSx=x. If truth at Sx proves truth at x, then falsity at x proves falsity at Sx, modus ponens to modus tollens. Thus the formula is false at zero, false of the successor of any number where it’s false, and so must be false everywhere under the induction axiom schema of first-order arithmetic. And so first-order arithmetic can rule out models like this:”
...er, I think I see? Because if this model obeys all the other axioms which which we already specified, that didn’t filter it out earlier—axioms like ‘zero is not the successor of any number’ and ‘if two numbers have the same successor they are the same number’ - then we can prove that the formula x≠SSSx is true at 0, and prove that if the formula true at x it must be true at x+1. So once we then add the further axiom that if x≠SSSx is true at 0, and if x≠SSSx is true at Sy when it’s true at y, then x≠SSSx is true at all x...
“We already have the premises, so we get the conclusion. ∀x: x≠SSSx, and thus we filter out all the 3-loops. Similar logic rules out N-loops for all N.”
So then did we get rid of all the nonstandard numbers, and leave only the standard model?
“No. Because there was also that problem with the infinite chain … −2*, −1*, 0*, 1* and so on.”
Here’s one idea for getting rid of the model with an infinite chain. All the nonstandard numbers in the chain are “greater” than all the standard numbers, right? Like, if w is a nonstandard number, then w > 3, w > 4, and so on?
“Well, we can prove by induction that no number is less than 0, and w isn’t equal to 0 or 1 or 2 or 3, so I’d have to agree with that.”
Okay. We should also be able to prove that if x > y then x + z > y + z. So if we take nonstandard w and ask about w + w, then w + w must be greater than w + 3, w + 4, and so on. So w + w can’t be part of the infinite chain at all, and yet adding any two numbers ought to yield a third number.
“Indeed, that does prove that if there’s one infinite chain, there must be two infinite chains. In other words, that original, exact model in the picture, can’t all by itself be a model of first-order arithmetic. But showing that the chain implies the existence of yet other elements, isn’t the same as proving that the chain doesn’t exist. Similarly, since all numbers are even or odd, we must be able to find v with v + v = w, or find v with v + v + 1 = w. Then v must be part of another nonstandard chain that comes before the chain containing w.”
But then that requires an infinite number of infinite chains of nonstandard numbers which are all greater than any standard number. Maybe we can extend this logic to eventually reach a contradiction and rule out the existence of an infinite chain in the first place—like, we’d show that any complete collection of nonstandard numbers has to be larger than itself -
“Good idea, but no. You end up with the conclusion that if a single nonstandard number exists, it must be part of a chain that’s infinite in both directions, i.e., a chain that looks like an ordered copy of the negative and positive integers. And that if an infinite chain exists, there must be infinite chains corresponding to all rational numbers. So something that could actually be a nonstandard model of first-order arithmetic, has to contain at least the standard numbers followed by a copy of the rational numbers with each rational number replaced by a copy of the integers. But then that setup works just fine with both addition and multiplication—we can’t prove that it has to be any larger than what we’ve already said.”
Okay, so how do we get rid of an infinite number of infinite chains of nonstandard numbers, and leave just the standard numbers at the beginning? What kind of statement would they violate—what sort of axiom would rule out all those extra numbers?
“We have to use second-order logic for that one.”
Honestly I’m still not 100% clear on the difference.
“Okay… earlier you gave me a formula which detected odd numbers.”
Right. ∃y: x=(2*y)+1, which was true at x=1, x=9 and so on, but not at x=0, x=4 and so on.
“When you think in terms of collections of numbers, well, there’s some collections which can be defined by formulas. For example, the collection of odd numbers {1, 3, 5, 7, 9, …} can be defined by the formula, with x free, ∃y: x=(2*y)+1. But you could also try to talk about just the collection {1, 3, 5, 7, 9, …} as a collection, a set of numbers, whether or not there happened to be any formula that defined it—”
Hold on, how can you talk about a set if you can’t define a formula that makes something a member or a non-member? I mean, that seems a bit smelly from a rationalist perspective -
“Er… remember the earlier conversation about kittens?”
“Suppose you say something like, ‘There exists a collection of kittens, such that every kitten adores only other kittens in the collection’. Give me a room full of kittens, and I can count through all possible collections, check your statement for each collection, and see whether or not there’s a collection which is actually like that. So the statement is meaningful—it can be falsified or verified, and it constrains the state of reality. But you didn’t give me a local formula for picking up a single kitten and deciding whether or not it ought to be in this mysterious collection. I had to iterate through all the collections of kittens, find the collections that matched your statement, and only then could I decide whether any individual kitten had the property of being in a collection like that. But the statement was still falsifiable, even though it was, in mathematical parlance, impredicative—that’s what we call it when you make a statement that can only be verified by looking at many possible collections, and doesn’t start from any particular collection that you tell me how to construct.”
Ah… hm. What about infinite universes of kittens, so you can’t iterate through all possible collections in finite time?
“If you say, ‘There exists a collection of kittens which all adore each other’, I could exhibit a group of three kittens which adored each other, and so prove the statement true. If you say ‘There’s a collection of four kittens who adore only each other’, I might come up with a constructive proof, given the other known properties of kittens, that your statement was false; and any time you tried giving me a group of four kittens, I could find a fifth kitten, adored by some kitten in your group, that falsified your attempt. But this is getting us into some rather deep parts of math we should probably stay out of for now. The point is that even in infinite universes, there are second-order statements that you can prove or falsify in finite amounts of time. And once you admit those particular second-order statements are talking about something meaningful, well, you might as well just admit that second-order statements in general are meaningful.”
...that sounds a little iffy to me, like we might get in trouble later on.
“You’re not the only mathematician who worries about that.”
But let’s get back to numbers. You say that we can use second-order logic to rule out any infinite chain.
“Indeed. In second-order logic, instead of using an infinite axiom schema over all formulas Φ, we quantify over possible collections directly, and say, in a single statement:”
∀P: P(0) ∧ (∀x: P(x) → P(Sx)) → (∀n: P(n))
“Here P is any predicate true or false of individual numbers. Any collection of numbers corresponds to a predicate that is true of numbers inside the collection and false of numbers outside of it.”
Okay… and how did that rule out infinite chains again?
“Because in principle, whether or not there’s any first-order formula that picks them out, there’s theoretically a collection that contains the standard numbers {0, 1, 2, …} and only the standard numbers. And if you treat that collection as a predicate P, then P is true at 0 - that is, 0 is in the standard numbers. And if 200 is a standard number then so is 201, and so on; if P is true at x, it’s true at x+1. On the other hand, if you treat the collection ‘just the standard numbers’ as a predicate, it’s false at −2*, false at −1*, false at 0* and so on—those numbers aren’t in this theoretical collection. So it’s vacuously true that this predicate is true at 1* if it’s true at 0*, because it’s not true at 0*. And so we end up with:”
“And so the single second-order axiom...”
∀P: P0 ∧ (∀x: Px → P(Sx)) → (∀n: Pn)
″...rules out any disconnected chains, finite loops, and indeed every nonstandard number, in one swell foop.”
But what did that axiom mean, exactly? I mean, taboo the phrase ‘standard numbers’ for a moment, pretend I’ve got no idea what those are, just explain to me what the axiom actually says.
“It says that the model being discussed—the model which fits this axiom—makes it impossible to form any collection closed under succession which includes 0 and doesn’t include everything. It’s impossible to have any collection of objects in this universe such that 0 is in the collection, and the successor of everything in the collection is in the collection, and yet this collection doesn’t contain everything. So you can’t have a disconnected infinite chain—there would then exist at least one collection over objects in this universe that contained 0 and all its successor-descendants, yet didn’t contain the chain; and we have a shiny new axiom which says that can’t happen.”
Can you perhaps operationalize that in a more sensorymotory sort of way? Like, if this is what I believe about the universe, then what do I expect to see?
“If this is what you believe about the mathematical model that you live in… then you believe that neither you, nor any adversary, nor yet a superintelligence, nor yet God, can consistently say ‘Yea’ or ‘Nay’ to objects in such fashion that when you present them with 0, they say ‘Yea’, and when you present them with any other object, if they say ‘Yea’, they also say ‘Yea’ for the successor of that object; and yet there is some object for which they say ‘Nay’. You believe this can never happen, no matter what. The way in which the objects in the universe are arranged by succession, just doesn’t let that happen, ever.”
Ah. So if, say, they said ‘Nay’ for 42, I’d go back and ask about 41, and then 40, and by the time I reached 0, I’d find either that they said ‘Nay’ about 0, or that they said ‘Nay’ for 41 and yet ‘Yea’ for 40. And what do I expect to see if I believe in first-order arithmetic, with the infinite axiom schema?
“In that case, you believe there’s no neatly specifiable, compactly describable rule which behaves like that. But if you believe the second-order version, you believe nobody can possibly behave like that even if they’re answering randomly, or branching the universe to answer different ways in different alternate universes, and so on. And note, by the way, that if we have a finite universe—i.e., we throw out the rule that every number has a successor, and say instead that 256 is the only number which has no successor—then we can verify this axiom in finite time.”
I see. Still, is there any way to rule out infinite chains using first-order logic? I might find that easier to deal with, even if it looks more complicated at first.
“I’m afraid not. One way I like to look at it is that first-order logic can talk about constraints on how the model looks from any local point, while only second-order logic can talk about global qualities of chains, collections, and the model as a whole. Whether every number has a successor is a local property—a question of how the model looks from the vantage point of any one number. Whether a number plus three, can be equal to itself, is a question you could evaluate at the local vantage point of any one number. Whether a number is even, is a question you can answer by looking around for a single, individual number x with the property that x+x equals the first number. But when you try to say that there’s only one connected chain starting at 0, by invoking the idea of connectedness and chains you’re trying to describe non-local properties that require a logic-of-possible-collections to specify.”
Huh. But if all the ‘local’ properties are the same regardless, why worry about global properties? In first-order arithmetic, any ‘local’ formula that’s true at zero and all of its ‘natural’ successors would also have to be true of all the disconnected infinite chains… right? Or did I make an error there? All the other infinite chains besides the 0-chain—all ‘nonstandard numbers’ - would have just the same properties as the ‘natural’ numbers, right?
“I’m afraid not. The first-order axioms of arithmetic may fail to pin down whether or not a Turing machine halts—whether there exists a time at which a Turing machine halts. Let’s say that from our perspective inside the standard numbers, the Turing machine ‘really doesn’t’ halt—it doesn’t halt on clock tick 0, doesn’t halt on clock tick 1, doesn’t halt on tick 2, and so on through all the standard successors of the 0-chain. In nonstandard models of the integers—models with other infinite chains—there might be somewhere inside a nonstandard chain where the Turing machine goes from running to halted and stays halted thereafter.”
“In this new model—which is fully compatible with the first-order axioms, and can’t be ruled out by them—it’s not true that ‘for every number t at which the Turing machine is running, it will still be running at t+1’. Even though if we could somehow restrict our attention to the ‘natural’ numbers, we would see that the Turing machine was running at 0, 1, 2, and every time in the successor-chain of 0.”
Okay… I’m not quite sure what the practical implication of that is?
“It means that many Turing machines which in fact never halt at any standard time, can’t be proven not to halt using first-order reasoning, because their non-halting-ness does not actually follow logically from the first-order axioms. Logic is about which conclusions follow from which premises, remember? If there are models which are compatible with all the first-order premises, but still falsify the statement ‘X runs forever’, then the statement ‘X runs forever’ can’t logically follow from those premises. This means you won’t be able to prove—shouldn’t be able to prove—that this Turing machine halts, using only first-order logic.”
How exactly would this fail in practice? I mean, where does the proof go bad?
“You wouldn’t get the second step of the induction, ‘for every number t at which the Turing machine is running, it will still be running at t+1’. There’d be nonstandard models with some nonstandard t that falsifies the premise—a nonstandard time where the Turing machine goes from running to halted. Even though if we could somehow restrict our attention to only the standard numbers, we would see that the Turing machine was running at 0, 1, 2, and so on.”
But if a Turing machine really actually halts, there’s got to be some particular time when it halts, like on step 97 -
“Indeed. But 97 exists in all nonstandard models of arithmetic, so we can prove its existence in first-order logic. Any time 0 is a number, every number has a successor, numbers don’t loop, and so on, there’ll exist 97. Every nonstandard model has at least the standard numbers. So whenever a Turing machine does halt, you can prove in first-order arithmetic that it halts—it does indeed follow from the premises. That’s kinda what you’d expect, given that you can just watch the Turing machine for 97 steps. When something actually does halt, you should be able to prove it halts without worrying about unbounded future times! It’s when something doesn’t actually halt—in the standard numbers, that is—that the existence of ‘nonstandard halting times’ becomes a problem. Then, the conclusion that the Turing machine runs forever may not actually follow from first-order arithmetic, because you can obey all the premises of first-order arithmetic, and yet still be inside a nonstandard model where this Turing machine halts at a nonstandard time.”
So second-order arithmetic is more powerful than first-order arithmetic in terms of what follows from the premises?
“That follows inevitably from the ability to talk about fewer possible models. As it is written, ‘What is true of one apple may not be true of another apple; thus more can be said about a single apple than about all the apples in the world.’ If you can restrict your discourse to a narrower collection of models, there are more facts that follow inevitably, because the more models you might be talking about, the fewer facts can possibly be true about all of them. And it’s also definitely true that second-order arithmetic proves more theorems than first-order arithmetic—for example, it can prove that a Turing machine which computes Goodstein sequences always reaches 0 and halts, or that Hercules always wins the hydra game. But there’s a bit of controversy we’ll get into later about whether second-order logic is actually more powerful than first-order logic in general.”
Well, sure. After all, just because nobody has ever yet invented a first-order formula to filter out all the nonstandard numbers, doesn’t mean it can never, ever be done. Tomorrow some brilliant mathematician might figure out a way to take an individual number x, and do local things to it using addition and multiplication and the existence or nonexistence of other individual numbers, which can tell us whether that number is part of the 0-chain or some other infinite-in-both-directions chain. It’ll be as easy as (a=b*c) -
“Nope. Ain’t never gonna happen.”
But maybe you could find some entirely different creative way of first-order axiomatizing the numbers which has only the standard model -
“Nope.”
Er… how do you know that, exactly? I mean, part of the Player Character Code is that you don’t give up when something seems impossible. I can’t quite see yet how to detect infinite chains using a first-order formula. But then earlier I didn’t realize you could rule out finite loops, which turned out to be quite simple once you explained. After all, there’s two distinct uses of the word ‘impossible’, one which indicates positive knowledge that something can never be done, that no possible chain of actions can ever reach a goal, even if you’re a superintelligence. This kind of knowledge requires a strong, definite grasp on the subject matter, so that you can rule out every possible avenue of success. And then there’s another, much more common use of the word ‘impossible’, which means that you thought about it for five seconds but didn’t see any way to do it, usually used in the presence of weak grasps on a subject, subjects that seem sacredly mysterious -
“Right. Ruling out an infinite-in-both-directions chain, using a first-order formula, is the first kind of impossibility. We know that it can never be done.”
I see. Well then, what do you think you know, and how do you think you know it? How is this definite, positive knowledge of impossibility obtained, using your strong grasp on the non-mysterious subject matter?
“We’ll take that up next time.”
Part of the sequence Highly Advanced Epistemology 101 for Beginners
Next post: “Godel’s Completeness and Incompleteness Theorems”
Previous post: “By Which It May Be Judged”
- By Which It May Be Judged by 10 Dec 2012 4:26 UTC; 95 points) (
- Godel’s Completeness and Incompleteness Theorems by 25 Dec 2012 1:16 UTC; 80 points) (
- No Really, There Are No Rules! by 7 Oct 2021 22:08 UTC; 57 points) (
- How much is known about the “inference rules” of logical induction? by 8 Aug 2020 10:45 UTC; 11 points) (
- 4 May 2015 4:04 UTC; 2 points) 's comment on Stupid Questions May 2015 by (
- 10 Sep 2020 6:13 UTC; 1 point) 's comment on Donald Hobson’s Shortform by (
No we don’t.
The model with the natural numbers and a single “integer line” is not a first order model of arithmetic. The reason is this. For a non-standard number “a” large enough there is a (non-standard) natural number that’s approximately some rational fraction of “a.” This number then has successors and predecessors, so it has an “integer line” around it. But because we can play this game for any fraction, we need lots of integer lines (ordered according to the total ordering on the rationals).
See this for details:
http://web.mit.edu/24.242/www/NonstandardModels.pdf
Yes we do.
The problem of a chain isn’t intended to be limited to the problem of exactly one chain, and I didn’t want to complicate the diagram or confuse my readers by showing them a copy of the rationals with each rational replaced by a copy of the integers. If you can’t get rid of a larger structure that has a chain in it, you can’t get rid of the chain. To put it another way, showing that the chain depicted implies further extra elements isn’t the same as ruling out the existence of that chain.
Hence the wording, “How do we get rid of the chain?” not “How do we get rid of this particular exact model here?”
A very quick way to see that there must be more than one chain is to note that if x > y, then x + z > y + z. An element of the nonstandard chain is greater than any natural number, so if we add two nonstandard numbers together, the result must be greater than the nonstandard starting point plus any natural number. Therefore there must be another chain which comes after the first one. For more on this see the linked paper.
EDIT: Several others reported misinterpreting what I had in the original, so I’ve edited the post accordingly. Thanks for raising the issue, Ilya!
It’s probably worth explicitly mentioning that the structure that you described isn’t actually a model of PA. I’d imagine that could otherwise be confusing for readers who have never seen this stuff before and are clever enough to notice the issue.
Edited.
Thanks, it makes much more sense now.
Thanks for editing!
After I had read your post but before I had read IlyaSphitser’s comment I thought that the particular model with a single integer chain was in fact a model of first-order arithmetic, so the post was definitely misleading to me in that respect.
Can someone explain this? I don’t understand how > can be a valid operation on two disconnected chains of number-thingies.
< is defined in terms of plus by saying x<y iff there exists a such that y=z+x. + is supposed to be provided as a primitive operation as part of the data consisting of a model of PA. It’s not actually possible to give a concrete description of what + looks like in general for non-standard models because of Tenenbaums’s Theorem, but at least when one of x or y (say x) is a standard number it’s exactly what you’d expect: x+y is what you get by starting at y and going x steps to the right.
To see that x<y whenever x is a standard number and y isn’t, you need to be a little tricky. You actually prove an infinite family of statements. The first one is “for all x, either x=0 or else x>0”. The second is “for all x, either x=0 or x=1 or x>1″, and in general it’s “for all x, either x=0,1,..., or n, or else x>n”. Each of these can be proven by induction, and the entire infinite family together implies that a non-standard number is bigger than every standard number.
Thanks!
I suppose you can’t prove a statement like “No matter how many times you expand this infinite family of axioms, you’ll never encounter a non-standard number” in first-order logic? Should I not think of numbers and non-standard numbers as having different types? Or should I think of > as accepting differently typed things? (where I’m using the definition of “type” from computer science, e.g. “strongly-typed language”)
Sorry I didn’t answer this before; I didn’t see it. To the extent that the analogy applies, you should think of non-standard numbers and standard numbers as having the same type. Specifically, the type of things that are being quantified over in whatever first order logic you are using. And you’re right that you can’t prove that statement in first order logic; Worse, you can’t even say it in first order logic (see the next post, on Godel’s theorems and Compactness/Lowenheim Skolem for why).
Thanks. Hm. I think I see why that can’t be said in first order logic.
...my brain is shouting “if I start at 0 and count up I’ll never reach a nonstandard number, therefore they don’t exist” at me so loudly that it’s very difficult to restrict my thoughts to only first-order ones.
This is largely a matter of keeping track of the distinction between “first order logic: the mathematical construct” and “first order logic: the form of reasoning I sometimes use when thinking about math”. The former is an idealized model of the latter, but they are distinct and belong in distinct mental buckets.
It may help to write a proof checker for first order logic. Or alternatively, if you are able to read higher math, study some mathematical logic/model theory.
In mathematics, a [binary] relation (like >, since it considers two natural numbers and then is either true or false, based on which numbers are considered) is just a set of ordered pairs. Within the standard model of the natural numbers, > is just the [infinite] collection of ordered pairs { (2,1) , (3,1) , (3,2) , (4,1) , (4,2) , (4,3) , … }. So, suppose we have two chains of number-thingies...1, 2, 3,… and 1^, 2^, 3^, …. We can make the ‘>’ rule as follows: ” ‘x > y’ if and only if ‘x has a caret and y does not, or (in this case, both numbers must be in the same chain) x is greater than y within its own chain’ ”. This [infinite] collection of ordered pairs would be { (2,1) , (2^,1^) , (1^,1) , (3^,1^) , (3,2) , (3^,2^) , (3,1) , (4^,1^) , (1^,2) , (4^,2^) , (4,3) , (4^,3^) , … }.
So ‘>’ is a valid relation on two disconnected chains of number-thingies, because we define it to be so by fiat. The numbers we’re working with are nonstandard...so there is no reason to expect that there should be some standard, natural meaning for ‘>’.
Important Note: This explanation of ‘>’ does not correspond to a nonstandard model of first-order Peano arithmetic (and, clearly, not the standard model, either). If you want to know more about that, look to earthwormchuck163′s comment. I thought it might be helpful to you to understand it in a case that’s easier to picture, before jumping to the case of a nonstandard model of first-order Peano arithmetic. That case is even more complex than Eliezer revealed within his post. It would probably be extremely helpful to you to learn about well-orders, order types, and the ordinal numbers to get a handle on this stuff. You are more talented than I if you are able to understand it without that background knowledge.
Hope this helps.
Edit: Annoyingly (in this case), the asterisk causes italicization. Changed asterisks to carets.
Edit 2: Changed “operation” to “relation” everywhere, as per paper-machine’s correct comment.
Not to nitpick, but “>” is a binary relation, not a binary operation.
Ha, thanks. I don’t mind nitpicking. I’ll edit the comment.
Actually, a binary relation is a binary operation (it returns 1 if true and 0 if false). You passed up a chance to counter-nitpick the nitpicker.
Yes, if you want a two-sorted theory, then you can make a boolean type and lift all relations to operations.
That’s not the typical use of the word “operation” in model theory, however.
Thanks, that last link was very helpful.
This really benefits from a picture. Calling something “a nonstandard number” doesn’t really convey anything about them and a better name I’ll use is “infinitely big”, because they are.
< makes sense because the 2 chains are finite numbers and infinitely big numbers and an infinitely big number is bigger than any finite one because it’s , well, infinite. I can elaborate more technically, but I think trying to develop some numeracy for infinite numbers is a lot like learning about negatives and rationals and complex numbers. Just play with some expressions and get used to them. then look at the more technical treatment even if you have the ability to read it. Someone gave that example with a flat list but I think and feel that tapping into one’s existing NUMBER (and not list) sense is very powerful since it’s the first math we learn and the only one people use every single day.
Wikipedia concurs:
Edit: Eliezer seems to have been aware of this, and gave a valid reply to your comment, so I won’t call it a “mistake” anymore. I do think some rewording or a clarifying annotation within the OP would be helpful, though.
Very nice. These notes say that every countable nonstandard model of Peano arithmetic is isomorphic, as an ordered set, to the natural numbers followed by lexicographically ordered pairs (r, z) for r a positive rational and z an integer. If I remember rightly, the ordering can be defined in terms of addition: x ⇐ y iff exists z. x+z ⇐ y. So if we want to have a countable nonstandard model of Peano arithmetic with successor function and addition we need all these nonstandard numbers.
It seems that if we only care about Peano arithmetic with the successor function, then the naturals plus a single copy of the integers is a model. If I was trying to prove this, I’d think that just looking at the successor function, to any first-order predicate an element of the copy of the integers would be indistinguishable from a very large standard natural number, by standard FO locality results.
I think whether naturals plus one non-standard integer line is a model of Peano’s axioms for the successor only (no addition/multiplication) depends on whether we use second order or first order logic to express induction. (No in second order formulation due to Dedekind’s result, yes for any first order formulation).
Eliezer, I just want to say thanks. This conversational method of teaching logic/math is very approachable and engaging to me. Much appreciated!
If you enjoyed this, you should try Gödel, Escher, Bach. The style and subject matter are very similar.
I suppose it is not simply coincidence that I am reading it right now. Thanks for the suggestion!
Consider using “☑” or similar) to mean “true”, rather than overloading “+”?
The double-turnstile ⊨ is the usual symbol for saying that a sentence is true (in a given model).
Or T and F for ‘is true’ and ‘is false’, or the T and upside-down T often used for (tautological) truth and (tautological) falsity for true and false.
Can you (or someone else too, I guess) give an example of a Turing machine with a nonstandard halting time? It’s not clear to me what you mean by running a Turing machine for a nonstandard number of steps. (I think I can make this meaningful in my favorite nonstandard model of Peano arithmetic, namely an ultrapower of the standard model, but I don’t see how to make it meaningful in general.)
Yeah, there’s a little non-obvious trick to talking about properties of Turing machines in the language of arithmetic, which is essential to understanding this.
The first thing you do is to use a little number theory to define a bijection between natural numbers and finite lists of natural numbers. Next, you define a way to encode the status of a Turing machine at one point in time as a list of numbers (giving the current state and the contents of the tapes); with your bijection, you can encode the status at one point in time as a single number. Now, you encode execution histories as finite lists of status numbers, which your bijection maps to a single number. You can write “n denotes a valid execution history that ends in a halting state” (i.e., n is a list of valid statuses, with the first one being a start status, the last one being a halting status, and each intermediate one being the uniquely determined correct successor to the previous one). After doing all this work, you can write a formula in the language of arithmetic saying “the Turing machine m halts on input i”, by simply saying “there is an n which denotes a valid execution history of machine m, starting at input i and ending in a halting state”.
Now consider an execution history consisting of a “finite” list of nonstandard length.
You get a formula which is true of the standard numbers m and i if and only if the m’th Turing machine halts on input i. Is there really any meaningful sense in which this formula is still talking about Turing machines when you substitute elements of some non-standard model?
In a sense, no. Eliezer’s point is this: Given the actual Turing machine with number m = 4 = SSSS0 and input i = 2 = SS0, you can substitute these in to get a closed formula φ whose meaning is “the Turing machine SSSS0 halts on input SS0”. The actual formula is something like, “There is a number e such that e denotes a valid execution history for machine SSSS0 on input SS0 that ends in a halting state.” In the standard model, talking about the standard numbers, this formula is true iff the machine actually halts on that input. But in first-order logic, you cannot pinpoint the standard model, and so it can happen that formula φ is false in the standard model, but true in some nonstandard model. If you use second-order logic (and believe its standard semantics, not its Henkin semantics), formula φ is valid, i.e. true in every model, if and only if machine 4 really halts on input 2.
Okay. This is exactly what I thought it should be, but the way Eliezer phrased things made me wonder if I was missing something. Thanks for clarifying.
Okay, this is what I suspected after thinking about it for a bit, but like earthwormchuck it is not clear to me in what sense we are “really” talking about running Turing machines for a nonstandard number of steps here… the interpretation I had in mind in the case of an ultrapower of the standard model is more direct: namely, running a Turing machine for the nonstandard number of steps (a1, a2, a3, …) ought to mean considering the sequence of states of the Turing machine after steps a1, a2, a3, … as an element of the ultrapower of the set of possible states of the Turing machine (in other words, after nonstandard times, the Turing machine may be in nonstandard states). It is not clear to me whether we have such an interpretation in general.
Ok—as I replied to earthwormchuck, I think Eliezer isn’t saying at all that there is a useful way in which these nonstandard execution histories are “really” talking about Turing machines, he’s saying the exact opposite: they aren’t talking about Turing machines, which is bad if you want to talk about Turing machines, since it means that first-order logic doesn’t suffice for expressing exactly what it is you do want to talk about.
WIth that interpretation, you couldn’t have a halt at a nonstandard time without halting at some standard time, right? If it were halted at some nonstandard time, it would be halted at almost all the standard times in that nonstandard time (here “almost all” is with respect to the chosen ultrafilter), and hence in particular at some standard time.
(Add here standard note for readers unused to infinity that it can be made perfectly sensible to talk about Turing machines running infinitely long and beyond but this has nothing to do with what’s being talked about here.)
Ah. Right. Somehow I totally forgot about Łoś′s theorem.
Disclaimer: I am not familiar with the formalities of Turing machines, and am quite possibly talking out of my ass, and probably not thinking along the same lines as Eliezer here. But it might be possible to salvage the ideas into something more formal/correct.
Consider a model containing exactly the natural numbers and the starred chain. Then we might have a Turing machine which starts at 0 and 0 , halts if it is fed 0, and continues to the successor otherwise. Then it never halts on the natural chain, but halts immediately on the starred chain. Here, a Turing machine presumably operates on every chain in a model meeting the first-order Peano axioms.
So in general, it might be meaningful to talk of a Turing machine acting within a model containing chains, which is closed on every given chain (e.g. it can’t jump from 0 to 0 ), and which could therefore be said to be associated by a ‘halt time’ function, h, which maps each chain (or each chain’s zero, if you like) to a nonnegative number in that chain which is the halting time on that chain. So in my above example, we might leave h(0) undefined, because the machine never halts on the naturals, and h(0 )=0*, because it halts immediately on that chain. This would then completely define the halting time over chains. (In fact, we could probably drop closure if we wanted to.)
(Edited:) I think you’re conflating the natural numbers and the tape that the Turing machine runs on. Interpreting “nonstandard halting time,” the way I think Eliezer is using the term, doesn’t require changing our notion of what a tape is; it just requires translating the statement “this Turing machine is in state s at time t” into a statement in Peano arithmetic (where t is a natural number) and then interpreting it in a nonstandard model.
I think that refers to turing machines that never halt at standard numbers of steps (i.e. it would halt at infinity, or more formally ω, which is a nonstandard number). It might also represent halting at a negative time (i.e. if you ran the turing machine backwards for N steps, then forward again for less than N steps, it would halt, but otherwise doesn’t halt). Anything that fails to halt in a standard number of steps can be considered to halt in a nonstandard number of steps, if you include the restraint that there has to be a value X such that it halts in X steps. By that definition, a turing machine halts if and only if X is a standard number. I could be wrong though.
Eliezer isn’t asking about how long a particular Turing machine takes to halt—he’s asking the binary question, “Will it halt or not?” As far as I could tell, Eliezer was claiming that there exist Turing machines that don’t halt, but that we can’t prove don’t halt using first-order Peano arithmetic. The particular example was to show how this claim was plausible (and, in fact, true).
In some cases, this isn’t even a well-defined operation.
Fails to halt. The standard numbers are the ones we care about. It’s the proof that this is the case that is nontrivial, and in some cases requires second-order logic (or at least, that’s what I think Eliezer is claiming). But you don’t always need second-order logic, so what you said (”...can be considered to halt in a nonstandard number of steps”, and really, this should be, “on a step corresponding to a nonstandard number”) was wrong.
By the way, ω isn’t a nonstandard number in countable nonstandard models of Peano arithmetic. It’s an ordinal number, not a cardinal number, so I’m not even exactly sure what you mean...but a Turing machine can’t halt at time infinity, because there’s no such thing as “time infinity”.
I really, honestly, don’t mean this reply to come off as condescending. I think it would help you to read through the Wikipedia article on Turing machines.
It should refer to a Turing machine that never halts but cannot be proven in Peano arithmetic not to halt. The second condition is important (otherwise it would just be a Turing machine that never halts, period). I know how to write down such a Turing machine (edit: for an explicit example, consider a Turing machine which is searching for a contradiction in PA); what I don’t know is how this definition can be related to a definition in terms of defining what it means to run a Turing machine for a nonstandard number of steps.
It doesn’t necessarily make sense to talk about running a Turing machine backwards. Also, models of first-order Peano arithmetic do not contain negative numbers; this is ruled out by the axiom that 0 is not a successor.
I don’t think it could halt at a negative time. If it did, it would have to stay halted, which would mean that it would still be halted at zero, so the program halts in the natural numbers.
You should be careful with addition and multiplication—to use them, you would have to define them first, and this is not trivial if you have the natural numbers plus A->B->C->A, infinite chains and so on.
In addition, “group” has a specific mathematical meaning, if you use it for arbitrary sets this is quite confusing.
You don’t have to define addition and multiplication—you can just make them be a part of your language. In fact, in first order theories of arithmetic, you have to do so because you cannot define addition and multiplication from successor in first order logic.
In other words, the difficulty is with the language not with whether you happen to be using a standard or a non-standard model. This is a general rule in model theory (and for that matter everywhere else): what you can express has to do with the language not the subject.
Does it matter if you don’t have formal rules for what you’re doing with models?
Do you expect what you’re doing with models to be formalizable in ZFC?
Does it matter if ZFC is a first-order theory?
“Does it matter if X” is not a question; “matter” is a two-place predicate (X matters to Y). What you seem to be worried about is the following: you need some set theory to talk about models of first-order logic. ZFC is a common axiomatization of set theory. But ZFC is itself a first-order theory, so it seems circular to use ZFC to talk about models of first-order logic. But if this is what you’re worried about, you should just say so directly.
If you taboo one-predicate ‘matter’, please specialize the two-place predicate (X matters to Y) to Y = “the OP’s subsequent use of this article”, and use the resulting one-place predicate.
I am not worried about apparent circularity. Once I internalized the Lowenheim-Skolem argument that first-order theories have countable “non-standard” models, then model theory dissolved for me. The syntactical / formalist view of semantics, that what mathematicians are doing is manipulating finite strings of symbols, is always a perfectly good model, in the model theoretic sense. If you want to understand what the mathematician is doing, you may look at what they’re doing, rather than taking them at their word and trying to boggle your imagination with images of bigness. Does dissolving model theory matter?
There’s plenty of encodings in mathematics—for example, using first-order predicate logic and the ZFC axioms to talk about second-order logic, or putting classical logic inside of intuitionistic logic with the double negation translation. Does the prevalence of encodings (analogous to the Turing Tarpit) matter?
Formal arguments, to be used in the real world, occur as the middle of an informal sandwich—first there’s an informal argument that the premises are appropriate or reasonable, and third there’s an informal argument interpreting the conclusion. I understand the formal part of this post, but I don’t understand the informal parts at all. Nonstandard (particularly countable) models are everywhere and unavoidable (analogously Godel showed that true but unprovable statements are everywhere and unavoidable). Against that background, the formal success of second-order logic in exiling nonstandard models of arithmetic doesn’t seem (to me) a good starting point for any third argument.
...now I find myself wishing philosophical whimsy reached more often for things like kitten innocence than contrived torture scenarios.
We can’t have both?
Not at the same time, I hope.
Every time you take box A and box B from Omega, Omicron tortures a kitten.
It was once thought that adoring cats caused one to get tortured. However, a recent medical study has come out, showing that most cat adorers have a certain gene, ACGT, and that whether someone has the gene or not, their chances of getting tortured go down if they adore cats. The strong correlation between adoring cats and getting tortured is because of a third factor, ACGT, that leads to both.
Having learned of this new study, would you choose to adore cats?
Or, indeed, should I choose not to adore cats because it might be evidence I had toxoplasmosis?
Actually, that toxoplasmosis thing is the only happiness-creating-preference-inducing, negative-side-effect disease I actually know that really works for Solomon’s Problem. You can either pet cute kittens already tested and guaranteed not to have toxoplasmosis, or refrain. This ought to be our go-to real-life example against EDT!
Done. http://intelligence.org/files/Comparison.pdf
You guys deliberately chose examples so that acronyms are entirely made up of letters also used for nucleotides, didn’t you?
I don’t think I have a choice really, I already do adore cats. Is that an actual study btw? If it is, I’m gonna cheat by getting a scientist to scan my dna and tell me if I have it.
Probably not.
Then you are mistaken about human psychology. You definitely have a choice about whether you will adore cats, it is simply one that requires action.
I hope you don’t imagine it’s an innocent kitten, though. Because even hypothetically speaking, it’s not. Eet ees an evil keeten.
Actually, whenever you two-box, my friend Clive puts his cat Omicron into Box A.
http://www.youtube.com/watch?v=SSKlYFfhBhY
Yes, he has a cat called Omicron, and he does normally go around with axioms of set theory on his t-shirt.
Every time you exclude A, alphabeta tortures 3^^^3 people with dust specks.
Nice post, but I think you got something wrong. Your structure with a single two-sided infinite chain isn’t actually a model of first order PA. If x is an element of the two-sided chain, then y=2x=x+x is another non-standard number, and y necessarily lies in a different chain since y-x=x is a non-standard number. Of course, you need to be a little bit careful to be sure that this argument can be expressed in first order language, but I’m pretty sure it can. So, as soon as there is one chain of non-standard numbers, that forces the existence of infinitely many.
Did you mean to say “connected”, or did I miss something?
Fascinating, I thought Tennanbaum’s theorem implied non-standard models were rather impossible to visualize. The non-standard model of Peano arithmetic illustrated in the diagram only gives the successor relation, there’s no definition of addition and multiplication. Tennenbaum’s theorem implies there’s no computable way to do this, but is there a proof that they can be defined at all for this particular model?
connected?
Well, this was very interesting, formal logic is a very fun topic. I just spent ~10 minutes trying to find a way in first order logic to write that axiom, as it intuitively feels (to someone who has studied formal logic at least) that there should be a way… Of course I failed, all the axioms I attempted turned out to be no more powerful then “0 is not the successor of any number”. I am deeply intrigued by this problem, and I am looking forward to your next post where you explain exactly why it’s impossible.
If you like spoilers, google “Lowenheim-Skoler”—the same technique as the proof for the “upwards” part allows you to generate non-standard models for the First-order logic version of Peano axioms in a fairly straight-forward manner.
“First, consider that the following formula detects 2-ness” Consider changing this to something like “First, consider that the following formula detects 2-ness among the numbers as we want them to be”? It wasn’t immediately obvious to me that the starred chain’s ‘-1*’ didn’t satisfy the equation.
Er, also, you might want to have only one of the interlocutors beginning sentences with “Er” lest we lose track of which is supposed to be current-you. ;)
But yeah, a nice exposition!
A couple of clarifications if somebody is as confused as me when first reading this.
In ZF we can quantify over sets because “set” is the name we use to designate the underlying objects (the set of natural numbers is an object in the theory). In Peano, the objects are numbers so we can quantify over those we cannot quantify over sets.
Predicates are more “powerful” than first-order formulas so quantifying over predicates allows us to restrict the possible models more than having an axiom for each formula. Even though every predicate is a formula, the interpretation of a predicate is determined by the model so we cannot capture all predicates by having a formula for each predicate symbol.
This is so much clearer than my college class.
I’m going to have to read the proof of the hydra game, because I pretty quickly got over 2.8k nodes and still in increasing...
It’s even worse than that, depending on how you start, you can easily get 100s of thousands of nodes...
It’s even worse than that, the function for the maximum number of nodes you end up before they start going down, if you play using the worst possible strategy, increases faster than any function which Peano arithmetic can prove to be total (i.e., it grows faster than any Turing machine run on various inputs, which Peano arithmetic can prove to halt for any input). To say that this grows faster than the Ackermann function is putting it very mildly.
Well, it doesn’t say you have to win quickly.
I was skeptical at first, but consider it this way: At each step you make a subtree simpler, and then insert an arbitrary number of copies of the simpler subtree. Eventually you must end up with a large number of copies of the simplest possible subtree, a single node off the root. Those don’t grow the hydra when removed, so you you chop them all off and then win.
I found I could see this intuitively if I chopped the top-most head of the most-complex tree for the first several rounds, in most configurations; you’ll see whatever tree you’re working on get wider, but shorter. It helps to lower the starting number of nodes to 7 or so, as well.
Yes, while it was clear on a second reading this was also clear, thanks.
Okay, my brain isn’t wrapping around this quite properly (though the explanation has already helped me to understand the concepts far better than my college education on the subject has!).
Consider the statement: “There exists no x for which, for any number k, x after k successions is equal to zero.” (¬∃x: ∃k: xS-k-times = 0, k>0 is the closest I can figure to depict it formally). Why doesn’t that axiom eliminate the possibility of any infinite or finite chain that involves a number below zero, and thus eliminate the possibility of the two-sided infinite chain?
Or… is that statement a second-order one, somehow, in which case how so?
Edit: Okay, the gears having turned a bit further, I’d like to add: “For all x, there exists a number k such that 0 after k successions is equal to x.”
That should deal with another possible understanding of that infinite chain. Or is defining k in those axioms the problem?
I made roughly a similar comment in the Logical Pinpointing post, and Kindly offered a response there.
If I understood him correctly basically it meant “you can’t use numbers to count stuff yet, until you’ve first pinpointed what a number is...”. And repetition isn’t defined in first order logic either.
Ah, so the statement is second-order.
And while I’m pretty sure you could replace the statement with an infinite number of first-order statements that precisely describe every member of the set (0S = 1, 0SS = 2, 0SSS = 3, etc), you couldn’t say “These are the only members of the set”, thus excluding other chains, without talking about the set—so it’d still be second-order.
Thanks!
It’s a bit worse than that. Even if we defined the “k-successions” operator (which is basically addition), it doesn’t actually let us do what we want. “For all x, there exists a number k such that 0 after k successions is equal to x” is always satisfied by setting k=x, even if x is some weird alternate-universe number like 2*. Granted, I have no clue what “taking 2* successions of 0” means, but...
Nice post! Are you going to get into Dedekind’s proof that all models of the second-order Peano axioms are isomorphic?
Edit: Fixed typo.
(exists x) (forall y) not (S(y) = x)
This statement is a first order statement which is true of standard natural numbers, but is not true in your model.
???
I don’t see yet how this connects to the other posts from the epistemology sequence, but it’s definitely nice. I’ve wanted to learn more mathematical logic for some time. I didn’t quite understand why exactly using an axiom schema isn’t as good as using second order logic, before I read this post.
I read that “any” as “for at least one”, rather than as “for every”. That confused me quite a bit. Maybe native speakers won’t have a problem with that, but to me the connection between “any” and “some” is too close.
It’s also not clear to my where the order relation comes from.
I think the point of this post is to demonstrate that logical pinpointing is hard. You might think that the first-order Peano arithmetic axioms logically pinpoint the natural numbers, and what this discussion will end up showing is that they just don’t because of general properties of first-order logic (specifically the Löwenheim–Skolem theorem).
If logically pinpointing something as seemingly simple as the natural numbers depends on something as seemingly nontrivial as understanding the distinction between first-order and second-order logic, then (or so I imagine the argument will continue) we shouldn’t expect logically pinpointing something like morality to be any easier. In fact we have every reason to expect it to be substantially harder.
The definition of the order relation is nontrivial. In second-order Peano arithmetic you can define addition from the successor operation by induction, and then you can define a to be less than b if there is a positive integer n such that a + n = b. My understanding is that you cannot define addition this way in first-order Peano arithmetic. Instead it is necessary to explicitly talk about addition in the axioms. From here one could also go on to explicitly talk about the order relation in the axioms.
Probably it’s because of the “no group” before it; cf “I can do anything” and “I can’t do anything”. Negations and quantifiers in English sometimes interact in weird ways, making it non-trivial to get the semantics from the syntax.
Wiktionary gives the meanings “at least one” and “no matter what kind”. The first likely doesn’t apply here, as it’s not used in a negation or question. To interpret “no matter what kind” to mean “every” seems like a stretch to me. I really do think the meaning of “any” is ambiguous here. “any” just specifies that we don’t have any further constraints on x. You could replace it with “every” or “at least one”, but not with “every even” or “at least one even”, as that would introduce a new constraint.
It doesn’t, but I was hypothesizing that the reason why on the first read it sounded to you as though it did was the negation (“no group”) before it.