I’ll have a go at 4. (I don’t know what is and isn’t in your brain, so this may be entirely unhelpful. Even if so, it may help make it clearer what would be helpful.)
I think a better perspective on induction is that “regular induction” is just a special case of transfinite induction.
Ordinary induction over the natural numbers
When you first learn about induction, it’s usually presented something like this: “To prove that P(n) for all n, prove that P(0) and that P(n) ⇒ P(n+1).” But it’s not that uncommon to have induction proofs that work a slightly different way—e.g., going from P(n) to P(2n) and P(2n+1) -- and a better way of looking at it is this: “When proving that P(n), you are allowed to assume that P(m) for all m<n”.
Any proof using “n->n+1” induction is already a proof using this more powerful induction principle. (Because assuming P(n-1) is just part of what it means to assume P(everything less than n).)
Why does this more powerful principle work? Because if P(n) fails then P(m) fails for some smaller m, and then P(l) fails for some still smaller l, and so on, and you get an endless sequence of smaller and smaller numbers—but these are all positive integers, so the sequence can’t go on for ever, contradiction.
So the key fact is that there are no infinite decreasing sequences.
I find it helpful to think about this in terms of how the natural numbers are (or can be) constructed. 0 is a natural number; every natural number has a successor; and that’s all of them. And in some sense this is why there are no infinite decreasing sequences: because each natural number is built by a finite increasing sequence.
Other varieties of induction
Now, the same thing works in other contexts where you have a kind of object (above, it’s natural numbers) and a notion of comparison (above, it’s the < relation) for which there are no infinite decreasing sequences.
(You often want to think of < as meaning “is simpler than”.)
These tend to be objects with a recursive construction similar to that of the natural numbers. For instance, binary trees. The empty tree is a binary tree; if L and R are binary trees then the tree consisting of a root node whose children are L and R are binary trees; and that’s all of them. So now you can do induction over binary trees: try to prove P(T) allowing yourself to assume P(T’) whenever T’ is a subtree of T. Or Lisp-style lists. The empty list is a list; if L is a list and x is a thing-that-isn’t-a-list, then there’s a list with x as head and L as tail; and that’s all of them. Now you can do induction over lists: to prove P(L), allow yourself to assume that P(tail(L)).
In these cases, you can turn this “structural induction” into ordinary induction over natural numbers: use the depth of the tree, or the length of the list. But there are other cases, and we’ll see one now.
Transfinite induction over sets and ordinals
Take our objects to be sets. And take < to mean “is an element of, or an element of an element of, etc.”.
Why are there no infinite decreasing sequences? Because there are no infinite chains that look like ⋯∈x∈y∈z, going on for ever on the left. Why’s that? Depending on how you look at it, it’s (1) because there’s an axiom (the Axiom of Foundation) that forbids it, or (2) because sets are constructed recursively, starting from the empty set, and this recursive construction guarantees it. (If you try to turn that into an actual proof, you end up back with the Axiom of Foundation, which is what guarantees that you can think of sets as being constructed recursively.)
Take our objects to be ordinals. And take < to be the usual ordinal comparison relation.
Why are there no infinite decreasing sequences? Because the ordinals are well-ordered by <. Why’s that? Because (at least with the usual construction in ZF(C)) any set of ordinals is a subset of a larger ordinal, with the < relation between ordinals corresponding to the < relation within that larger ordinal.
(These examples can’t be translated into ordinary integer inductions: there are far more ordinals than integers, and far more “ranks” of sets than integers. But we still have the no-infinite-descending-chains property!)
These are all the same
All of these work the same way. You have a kind of object (natural numbers, trees, lists, sets, ordinals), a notion of being smaller/simpler (comparison of integers/ordinals; subtree; tail of list; element of set), and a theorem or an axiom that guarantees that you can’t keep getting smaller/simpler for ever. And, in that context, anything that’s true of X whenever it’s true of everything simpler than X is true of all X. And that’s what induction really is.
It seems a little odd to appeal to no infinite decreasing sequences (which are much more complicated objects than integers) to justify induction. The fundamental intuition is that if we start out with things with a certain property, and only do operations that preserve the property, then we’ll never end up with anything that doesn’t have the property. (Of course in the set-theoretic case “operations” includes things like “take the set of all objects that we’ve constructed so far, even if it is infinite”.)Without this intuition it’s hard to see why it should be the case that there are no infinite decreasing sequences.
I guess for natural numbers, the other intuition is that if a property holds for some number, then there should be a “first time” that it holds—just count upwards and check whether the property holds for each number, and stop once you get to the first one. Of course you can make this into an equivalent intuition about transfinite numbers, though it becomes a little less intuitive in that case.
I agree that that intuition is very helpful; that’s the point of my paragraph beginning “I find it helpful to think …”. But what distinguishes things you can do induction on from things you can’t is exactly the no-infinite-descending-sequences property (or the equivalent property that “any nonempty set has a smallest element”, which is in some sense simpler but to my mind less intuitive).
Hmm, it seems like “why is X true?” can have two different meanings, basically “what is the reason that X is true?” versus “what is a convenient fact about X that guarantees that it is true?” I guess you were answering the second question and I was answering the first.
But actually with regards to your question, I think it is also true that whether you can view a collection as being generated by a process that creates new elements from old elements is also a distinguishing feature of whether you can do induction on the collection. If you want to make it into a formal statement in ZFC (and for the record I don’t, since I don’t think that ZFC includes the right formalism in which to make such a statement), you could say that you can do induction on a poset (S,<) if and only if there exists a set R⊆P(S)×S (where a pair (A,x)∈R is thought of as representing a rule of the form “if all elements of A are in our set, then add x to our set”) such that (a) for all (A,x)∈R and y<x, there exists z∈A such that y≤z, and (b) S is the smallest set B⊆S with the property that for all (A,x)∈R, if A⊆B then x∈B.
In the case of pure mathematics, I think “the reason X is true” is often ill-defined; there will be different ways of thinking about the things involved in X, and they will lead to different answers to “why is X true?”.
(Is the prime number theorem true because the zeta function has no zeros on Re(s)=1, or does the zeta function have no zeros there because of how the prime numbers are distributed? Or is PNT perhaps true because of Selberg’s “fundamental formula”, which can be used to give a proof of PNT that makes no mention of complex analysis or the zeta function—or vice versa? My inclination is to say that those high-powered theorems we use to prove PNT are true because the prime numbers are distributed as they are—but we are then in the uncomfortable situation that all known ways of proving that they’re distributed that way go via these high-powered theorems, so X is true because Y is true but we can only prove X by proving Y first.)
It seems to me that “because” language in pure mathematics is co-opting a bunch of intuitions that were formed by considering cause-and-effect in the physical world. Causation is a tricksy notion even in the physical world, and it’s worse still in pure mathematics.
I agree (I think; it’s easy to go astray in this stuff) that your “upward” induction principle is equivalent to the formulation in terms of well-orderings. For what it’s worth, I think your principle works “because” we can take R = { (A, smallest thing not in A) } rather than thinking that the formulation in terms of well-orderings works “because” it guarantees we can think of making new elements from old; it’s interesting how different things are more-intuitive to different people :-).
I’m not assuming that my poset is totally ordered, so there isn’t always a “smallest thing not in A”. For example think of the binary trees you mentioned above, with ancestry as the order relation.
I get the impression we do still have a substantive disagreement, so I’ll taboo “because” and instead ask the question: how are the natural numbers built? Or in other words, what does it mean to say that something is a natural number? I would say that it just means that it is one of the objects in the collection you get if you start with 0, and each time you have n then you add n+1 to your collection. A book such as the one discussed in the OP, however, will give a different answer: a natural number is an element of the intersection of all sets that contain 0 and are closed under the operation n↦n+1. These two definitions are arguably equivalent—but such an equivalence cannot be proven in a language like ZFC, since the only way ZFC has of formalizing the first definition is to beg the question and encode it according to the second definition.
In a sense all properties of a mathematical object are “because” of the way that it was built. So perhaps our different intuitions about “because” are based on different ideas of what it means to be a natural number? (Or perhaps not...)
Regarding the prime number theorem: I’m not an expert so I don’t have an opinion on which way we should say that the causality goes in that particular case, but I do think a lot of the time it is useful to ask such questions, and that giving good answers (even if they aren’t the type of thing that can be rigorous) can help one understand a subject better.
I think it’s highly debatable whether the natural numbers are built at all. Arguably they’re just there (in some sense). One can construct particular “implementations” of the natural numbers, and there are many ways to do it; for instance, the usual way to do it in NF is a Frege-like construction: natural numbers are equivalence classes of finite sets under the relation “can be put in bijection with”; “finite” means “can’t be put in bijection with any finite subset”. (You can’t do that in ZF(C) because there are too many finite sets, but perhaps you can do it in a system that adds proper classes, like NBG.)
I don’t have strong feelings about how the natural numbers “should” be built, or what they “really” are. I’m happy thinking of them as “sizes of finite sets”, or as the things you get if you start at 0 and add 1 as often as you like (though there’s a certain circularity about that definition), or even as finite sequences of bits (though, again, there’s some circularity there). I don’t think it’s coincidence that these all lead to the same theorems, but I don’t feel any particular need to pick one definition and say that the others are all somehow parasitic on it.
Incidentally, when Frege came to define the natural numbers (this was in 1884, a few years before the usual Peano axioms were formulated, and I think he was the first person to do anything of the kind) he did it by (1) defining cardinal numbers as equivalence classes of sets under the same-size relation, and then (2) saying that a natural number is anything you can get to 0 from by counting downwards. Make of that what you will.
I’ll have a go at 4. (I don’t know what is and isn’t in your brain, so this may be entirely unhelpful. Even if so, it may help make it clearer what would be helpful.)
I think a better perspective on induction is that “regular induction” is just a special case of transfinite induction.
Ordinary induction over the natural numbers
When you first learn about induction, it’s usually presented something like this: “To prove that P(n) for all n, prove that P(0) and that P(n) ⇒ P(n+1).” But it’s not that uncommon to have induction proofs that work a slightly different way—e.g., going from P(n) to P(2n) and P(2n+1) -- and a better way of looking at it is this: “When proving that P(n), you are allowed to assume that P(m) for all m<n”.
Any proof using “n->n+1” induction is already a proof using this more powerful induction principle. (Because assuming P(n-1) is just part of what it means to assume P(everything less than n).)
Why does this more powerful principle work? Because if P(n) fails then P(m) fails for some smaller m, and then P(l) fails for some still smaller l, and so on, and you get an endless sequence of smaller and smaller numbers—but these are all positive integers, so the sequence can’t go on for ever, contradiction.
So the key fact is that there are no infinite decreasing sequences.
I find it helpful to think about this in terms of how the natural numbers are (or can be) constructed. 0 is a natural number; every natural number has a successor; and that’s all of them. And in some sense this is why there are no infinite decreasing sequences: because each natural number is built by a finite increasing sequence.
Other varieties of induction
Now, the same thing works in other contexts where you have a kind of object (above, it’s natural numbers) and a notion of comparison (above, it’s the < relation) for which there are no infinite decreasing sequences.
(You often want to think of < as meaning “is simpler than”.)
These tend to be objects with a recursive construction similar to that of the natural numbers. For instance, binary trees. The empty tree is a binary tree; if L and R are binary trees then the tree consisting of a root node whose children are L and R are binary trees; and that’s all of them. So now you can do induction over binary trees: try to prove P(T) allowing yourself to assume P(T’) whenever T’ is a subtree of T. Or Lisp-style lists. The empty list is a list; if L is a list and x is a thing-that-isn’t-a-list, then there’s a list with x as head and L as tail; and that’s all of them. Now you can do induction over lists: to prove P(L), allow yourself to assume that P(tail(L)).
In these cases, you can turn this “structural induction” into ordinary induction over natural numbers: use the depth of the tree, or the length of the list. But there are other cases, and we’ll see one now.
Transfinite induction over sets and ordinals
Take our objects to be sets. And take < to mean “is an element of, or an element of an element of, etc.”.
Why are there no infinite decreasing sequences? Because there are no infinite chains that look like ⋯∈x∈y∈z, going on for ever on the left. Why’s that? Depending on how you look at it, it’s (1) because there’s an axiom (the Axiom of Foundation) that forbids it, or (2) because sets are constructed recursively, starting from the empty set, and this recursive construction guarantees it. (If you try to turn that into an actual proof, you end up back with the Axiom of Foundation, which is what guarantees that you can think of sets as being constructed recursively.)
Take our objects to be ordinals. And take < to be the usual ordinal comparison relation.
Why are there no infinite decreasing sequences? Because the ordinals are well-ordered by <. Why’s that? Because (at least with the usual construction in ZF(C)) any set of ordinals is a subset of a larger ordinal, with the < relation between ordinals corresponding to the < relation within that larger ordinal.
(These examples can’t be translated into ordinary integer inductions: there are far more ordinals than integers, and far more “ranks” of sets than integers. But we still have the no-infinite-descending-chains property!)
These are all the same
All of these work the same way. You have a kind of object (natural numbers, trees, lists, sets, ordinals), a notion of being smaller/simpler (comparison of integers/ordinals; subtree; tail of list; element of set), and a theorem or an axiom that guarantees that you can’t keep getting smaller/simpler for ever. And, in that context, anything that’s true of X whenever it’s true of everything simpler than X is true of all X. And that’s what induction really is.
It seems a little odd to appeal to no infinite decreasing sequences (which are much more complicated objects than integers) to justify induction. The fundamental intuition is that if we start out with things with a certain property, and only do operations that preserve the property, then we’ll never end up with anything that doesn’t have the property. (Of course in the set-theoretic case “operations” includes things like “take the set of all objects that we’ve constructed so far, even if it is infinite”.)Without this intuition it’s hard to see why it should be the case that there are no infinite decreasing sequences.
I guess for natural numbers, the other intuition is that if a property holds for some number, then there should be a “first time” that it holds—just count upwards and check whether the property holds for each number, and stop once you get to the first one. Of course you can make this into an equivalent intuition about transfinite numbers, though it becomes a little less intuitive in that case.
I agree that that intuition is very helpful; that’s the point of my paragraph beginning “I find it helpful to think …”. But what distinguishes things you can do induction on from things you can’t is exactly the no-infinite-descending-sequences property (or the equivalent property that “any nonempty set has a smallest element”, which is in some sense simpler but to my mind less intuitive).
Hmm, it seems like “why is X true?” can have two different meanings, basically “what is the reason that X is true?” versus “what is a convenient fact about X that guarantees that it is true?” I guess you were answering the second question and I was answering the first.
But actually with regards to your question, I think it is also true that whether you can view a collection as being generated by a process that creates new elements from old elements is also a distinguishing feature of whether you can do induction on the collection. If you want to make it into a formal statement in ZFC (and for the record I don’t, since I don’t think that ZFC includes the right formalism in which to make such a statement), you could say that you can do induction on a poset (S,<) if and only if there exists a set R⊆P(S)×S (where a pair (A,x)∈R is thought of as representing a rule of the form “if all elements of A are in our set, then add x to our set”) such that (a) for all (A,x)∈R and y<x, there exists z∈A such that y≤z, and (b) S is the smallest set B⊆S with the property that for all (A,x)∈R, if A⊆B then x∈B.
I guess both points of view can be useful.
In the case of pure mathematics, I think “the reason X is true” is often ill-defined; there will be different ways of thinking about the things involved in X, and they will lead to different answers to “why is X true?”.
(Is the prime number theorem true because the zeta function has no zeros on Re(s)=1, or does the zeta function have no zeros there because of how the prime numbers are distributed? Or is PNT perhaps true because of Selberg’s “fundamental formula”, which can be used to give a proof of PNT that makes no mention of complex analysis or the zeta function—or vice versa? My inclination is to say that those high-powered theorems we use to prove PNT are true because the prime numbers are distributed as they are—but we are then in the uncomfortable situation that all known ways of proving that they’re distributed that way go via these high-powered theorems, so X is true because Y is true but we can only prove X by proving Y first.)
It seems to me that “because” language in pure mathematics is co-opting a bunch of intuitions that were formed by considering cause-and-effect in the physical world. Causation is a tricksy notion even in the physical world, and it’s worse still in pure mathematics.
I agree (I think; it’s easy to go astray in this stuff) that your “upward” induction principle is equivalent to the formulation in terms of well-orderings. For what it’s worth, I think your principle works “because” we can take R = { (A, smallest thing not in A) } rather than thinking that the formulation in terms of well-orderings works “because” it guarantees we can think of making new elements from old; it’s interesting how different things are more-intuitive to different people :-).
I’m not assuming that my poset is totally ordered, so there isn’t always a “smallest thing not in A”. For example think of the binary trees you mentioned above, with ancestry as the order relation.
I get the impression we do still have a substantive disagreement, so I’ll taboo “because” and instead ask the question: how are the natural numbers built? Or in other words, what does it mean to say that something is a natural number? I would say that it just means that it is one of the objects in the collection you get if you start with 0, and each time you have n then you add n+1 to your collection. A book such as the one discussed in the OP, however, will give a different answer: a natural number is an element of the intersection of all sets that contain 0 and are closed under the operation n↦n+1. These two definitions are arguably equivalent—but such an equivalence cannot be proven in a language like ZFC, since the only way ZFC has of formalizing the first definition is to beg the question and encode it according to the second definition.
In a sense all properties of a mathematical object are “because” of the way that it was built. So perhaps our different intuitions about “because” are based on different ideas of what it means to be a natural number? (Or perhaps not...)
Regarding the prime number theorem: I’m not an expert so I don’t have an opinion on which way we should say that the causality goes in that particular case, but I do think a lot of the time it is useful to ask such questions, and that giving good answers (even if they aren’t the type of thing that can be rigorous) can help one understand a subject better.
I think it’s highly debatable whether the natural numbers are built at all. Arguably they’re just there (in some sense). One can construct particular “implementations” of the natural numbers, and there are many ways to do it; for instance, the usual way to do it in NF is a Frege-like construction: natural numbers are equivalence classes of finite sets under the relation “can be put in bijection with”; “finite” means “can’t be put in bijection with any finite subset”. (You can’t do that in ZF(C) because there are too many finite sets, but perhaps you can do it in a system that adds proper classes, like NBG.)
I don’t have strong feelings about how the natural numbers “should” be built, or what they “really” are. I’m happy thinking of them as “sizes of finite sets”, or as the things you get if you start at 0 and add 1 as often as you like (though there’s a certain circularity about that definition), or even as finite sequences of bits (though, again, there’s some circularity there). I don’t think it’s coincidence that these all lead to the same theorems, but I don’t feel any particular need to pick one definition and say that the others are all somehow parasitic on it.
Incidentally, when Frege came to define the natural numbers (this was in 1884, a few years before the usual Peano axioms were formulated, and I think he was the first person to do anything of the kind) he did it by (1) defining cardinal numbers as equivalence classes of sets under the same-size relation, and then (2) saying that a natural number is anything you can get to 0 from by counting downwards. Make of that what you will.