edanm comments on Set Up for Success: Insights from ‘Naïve Set Theory’

edanm 28 Feb 2018 13:05 UTC
7 points
Thanks for the generous offer! What kind of requests would you like to get? Specific questions? Certain subjects? etc.
In any case, I’m going to write a bunch of stuff I’d love to have explained more thoroughly, some general, some more specific, if you can explain any of them that would of course be amazing from my point of view. Most of these are just things that came up for me while (re)learning a bunch of math.
1. Linear Algebra—I have great intuition for most matrix operations based on Strang and 3blue1brown videos. E.g. inverse, change of basis, etc. One thing missing: what is the intuitive “geometric” interpretation of matrix transpose? Considering the transpose is used all the time, I still don’t have an intuitive sense of what it does.
2. An intuitive explanation of the SVD.
3. More general: I’m trying to recreate a math degree from scratch, more or less. There are tons of lists of best textbook for each subject, but I’d love more of an overarching “here are the subjects you need to know, here’s the best order to study them” kind of list. There aren’t many great lists for these.
4. Set Theory—an explanation of the difference between transfinite induction and regular induction. I.e. IIRC, there needs to be defined a whole new kind of induction in order to do transfinite induction vs the base induction used to define things a little past N. Why is another mechanism necessary?
Thanks!
- gjm 28 Feb 2018 15:01 UTC
  11 points
  Parent
  I’ll have a go at 4. (I don’t know what is and isn’t in your brain, so this may be entirely unhelpful. Even if so, it may help make it clearer what would be helpful.)
  I think a better perspective on induction is that “regular induction” is just a special case of transfinite induction.
  Ordinary induction over the natural numbers
  When you first learn about induction, it’s usually presented something like this: “To prove that P(n) for all n, prove that P(0) and that P(n) ⇒ P(n+1).” But it’s not that uncommon to have induction proofs that work a slightly different way—e.g., going from P(n) to P(2n) and P(2n+1) -- and a better way of looking at it is this: “When proving that P(n), you are allowed to assume that P(m) for all m<n”.
  Any proof using “n->n+1” induction is already a proof using this more powerful induction principle. (Because assuming P(n-1) is just part of what it means to assume P(everything less than n).)
  Why does this more powerful principle work? Because if P(n) fails then P(m) fails for some smaller m, and then P(l) fails for some still smaller l, and so on, and you get an endless sequence of smaller and smaller numbers—but these are all positive integers, so the sequence can’t go on for ever, contradiction.
  So the key fact is that there are no infinite decreasing sequences.
  I find it helpful to think about this in terms of how the natural numbers are (or can be) constructed. 0 is a natural number; every natural number has a successor; and that’s all of them. And in some sense this is why there are no infinite decreasing sequences: because each natural number is built by a finite increasing sequence.
  Other varieties of induction
  Now, the same thing works in other contexts where you have a kind of object (above, it’s natural numbers) and a notion of comparison (above, it’s the < relation) for which there are no infinite decreasing sequences.
  (You often want to think of < as meaning “is simpler than”.)
  These tend to be objects with a recursive construction similar to that of the natural numbers. For instance, binary trees. The empty tree is a binary tree; if L and R are binary trees then the tree consisting of a root node whose children are L and R are binary trees; and that’s all of them. So now you can do induction over binary trees: try to prove P(T) allowing yourself to assume P(T’) whenever T’ is a subtree of T. Or Lisp-style lists. The empty list is a list; if L is a list and x is a thing-that-isn’t-a-list, then there’s a list with x as head and L as tail; and that’s all of them. Now you can do induction over lists: to prove P(L), allow yourself to assume that P(tail(L)).
  In these cases, you can turn this “structural induction” into ordinary induction over natural numbers: use the depth of the tree, or the length of the list. But there are other cases, and we’ll see one now.
  Transfinite induction over sets and ordinals
  Take our objects to be sets. And take < to mean “is an element of, or an element of an element of, etc.”.
  Why are there no infinite decreasing sequences? Because there are no infinite chains that look like $\dots \in x \in y \in z$ , going on for ever on the left. Why’s that? Depending on how you look at it, it’s (1) because there’s an axiom (the Axiom of Foundation) that forbids it, or (2) because sets are constructed recursively, starting from the empty set, and this recursive construction guarantees it. (If you try to turn that into an actual proof, you end up back with the Axiom of Foundation, which is what guarantees that you can think of sets as being constructed recursively.)
  Take our objects to be ordinals. And take < to be the usual ordinal comparison relation.
  Why are there no infinite decreasing sequences? Because the ordinals are well-ordered by <. Why’s that? Because (at least with the usual construction in ZF(C)) any set of ordinals is a subset of a larger ordinal, with the < relation between ordinals corresponding to the < relation within that larger ordinal.
  (These examples can’t be translated into ordinary integer inductions: there are far more ordinals than integers, and far more “ranks” of sets than integers. But we still have the no-infinite-descending-chains property!)
  These are all the same
  All of these work the same way. You have a kind of object (natural numbers, trees, lists, sets, ordinals), a notion of being smaller/simpler (comparison of integers/ordinals; subtree; tail of list; element of set), and a theorem or an axiom that guarantees that you can’t keep getting smaller/simpler for ever. And, in that context, anything that’s true of X whenever it’s true of everything simpler than X is true of all X. And that’s what induction really is.
  - Dacyn 1 Mar 2018 18:52 UTC
    4 points
    Parent
    It seems a little odd to appeal to no infinite decreasing sequences (which are much more complicated objects than integers) to justify induction. The fundamental intuition is that if we start out with things with a certain property, and only do operations that preserve the property, then we’ll never end up with anything that doesn’t have the property. (Of course in the set-theoretic case “operations” includes things like “take the set of all objects that we’ve constructed so far, even if it is infinite”.)Without this intuition it’s hard to see why it should be the case that there are no infinite decreasing sequences.
    I guess for natural numbers, the other intuition is that if a property holds for some number, then there should be a “first time” that it holds—just count upwards and check whether the property holds for each number, and stop once you get to the first one. Of course you can make this into an equivalent intuition about transfinite numbers, though it becomes a little less intuitive in that case.
    - gjm 1 Mar 2018 20:50 UTC
      2 points
      Parent
      I agree that that intuition is very helpful; that’s the point of my paragraph beginning “I find it helpful to think …”. But what distinguishes things you can do induction on from things you can’t is exactly the no-infinite-descending-sequences property (or the equivalent property that “any nonempty set has a smallest element”, which is in some sense simpler but to my mind less intuitive).
      - Dacyn 1 Mar 2018 23:36 UTC
        1 point
        Parent
        Hmm, it seems like “why is X true?” can have two different meanings, basically “what is the reason that X is true?” versus “what is a convenient fact about X that guarantees that it is true?” I guess you were answering the second question and I was answering the first.
        But actually with regards to your question, I think it is also true that whether you can view a collection as being generated by a process that creates new elements from old elements is also a distinguishing feature of whether you can do induction on the collection. If you want to make it into a formal statement in ZFC (and for the record I don’t, since I don’t think that ZFC includes the right formalism in which to make such a statement), you could say that you can do induction on a poset $(S, <)$ if and only if there exists a set $R \subseteq P (S) \times S$ (where a pair $(A, x) \in R$ is thought of as representing a rule of the form “if all elements of $A$ are in our set, then add $x$ to our set”) such that (a) for all $(A, x) \in R$ and $y < x$ , there exists $z \in A$ such that $y \leq z$ , and (b) $S$ is the smallest set $B \subseteq S$ with the property that for all $(A, x) \in R$ , if $A \subseteq B$ then $x \in B$ .
        I guess both points of view can be useful.
        gjm 2 Mar 2018 12:54 UTC
        4 points
        Parent
        In the case of pure mathematics, I think “the reason X is true” is often ill-defined; there will be different ways of thinking about the things involved in X, and they will lead to different answers to “why is X true?”.
        (Is the prime number theorem true because the zeta function has no zeros on Re(s)=1, or does the zeta function have no zeros there because of how the prime numbers are distributed? Or is PNT perhaps true because of Selberg’s “fundamental formula”, which can be used to give a proof of PNT that makes no mention of complex analysis or the zeta function—or vice versa? My inclination is to say that those high-powered theorems we use to prove PNT are true because the prime numbers are distributed as they are—but we are then in the uncomfortable situation that all known ways of proving that they’re distributed that way go via these high-powered theorems, so X is true because Y is true but we can only prove X by proving Y first.)
        It seems to me that “because” language in pure mathematics is co-opting a bunch of intuitions that were formed by considering cause-and-effect in the physical world. Causation is a tricksy notion even in the physical world, and it’s worse still in pure mathematics.
        I agree (I think; it’s easy to go astray in this stuff) that your “upward” induction principle is equivalent to the formulation in terms of well-orderings. For what it’s worth, I think your principle works “because” we can take R = { (A, smallest thing not in A) } rather than thinking that the formulation in terms of well-orderings works “because” it guarantees we can think of making new elements from old; it’s interesting how different things are more-intuitive to different people :-).
        Dacyn 3 Mar 2018 0:01 UTC
        1 point
        Parent
        I’m not assuming that my poset is totally ordered, so there isn’t always a “smallest thing not in $A$ ”. For example think of the binary trees you mentioned above, with ancestry as the order relation.
        I get the impression we do still have a substantive disagreement, so I’ll taboo “because” and instead ask the question: how are the natural numbers built? Or in other words, what does it mean to say that something is a natural number? I would say that it just means that it is one of the objects in the collection you get if you start with 0, and each time you have $n$ then you add $n + 1$ to your collection. A book such as the one discussed in the OP, however, will give a different answer: a natural number is an element of the intersection of all sets that contain 0 and are closed under the operation $n \mapsto n + 1$ . These two definitions are arguably equivalent—but such an equivalence cannot be proven in a language like ZFC, since the only way ZFC has of formalizing the first definition is to beg the question and encode it according to the second definition.
        In a sense all properties of a mathematical object are “because” of the way that it was built. So perhaps our different intuitions about “because” are based on different ideas of what it means to be a natural number? (Or perhaps not...)
        Regarding the prime number theorem: I’m not an expert so I don’t have an opinion on which way we should say that the causality goes in that particular case, but I do think a lot of the time it is useful to ask such questions, and that giving good answers (even if they aren’t the type of thing that can be rigorous) can help one understand a subject better.
        gjm 3 Mar 2018 11:29 UTC
        2 points
        Parent
        I think it’s highly debatable whether the natural numbers are built at all. Arguably they’re just there (in some sense). One can construct particular “implementations” of the natural numbers, and there are many ways to do it; for instance, the usual way to do it in NF is a Frege-like construction: natural numbers are equivalence classes of finite sets under the relation “can be put in bijection with”; “finite” means “can’t be put in bijection with any finite subset”. (You can’t do that in ZF(C) because there are too many finite sets, but perhaps you can do it in a system that adds proper classes, like NBG.)
        I don’t have strong feelings about how the natural numbers “should” be built, or what they “really” are. I’m happy thinking of them as “sizes of finite sets”, or as the things you get if you start at 0 and add 1 as often as you like (though there’s a certain circularity about that definition), or even as finite sequences of bits (though, again, there’s some circularity there). I don’t think it’s coincidence that these all lead to the same theorems, but I don’t feel any particular need to pick one definition and say that the others are all somehow parasitic on it.
        Incidentally, when Frege came to define the natural numbers (this was in 1884, a few years before the usual Peano axioms were formulated, and I think he was the first person to do anything of the kind) he did it by (1) defining cardinal numbers as equivalence classes of sets under the same-size relation, and then (2) saying that a natural number is anything you can get to 0 from by counting downwards. Make of that what you will.
- Qiaochu_Yuan 28 Feb 2018 20:22 UTC
  5 points
  Parent
  What kind of requests would you like to get? Specific questions? Certain subjects? etc.
  The more specific the better, basically.
  One thing missing: what is the intuitive “geometric” interpretation of matrix transpose? Considering the transpose is used all the time, I still don’t have an intuitive sense of what it does.
  It’s unclear to me what counts as a “geometric” interpretation here. The coordinate-free interpretation is that taking transposes is the coordinate-dependent version of taking the adjoint of a linear operator between inner product spaces. There’s also an interpretation in terms of SVD which may be more explicitly geometric: taking transposes swaps the left and right singular vectors. (This is one way to think about the spectral theorem: it means a symmetric matrix is a matrix whose left and right singular vectors agree, which means they’re eigenvectors. This isn’t quite a proof because singular vectors aren’t quite unique but it’s pretty close.)
  An intuitive explanation of the SVD.
  I wrote a blog post about SVD; not sure whether it will give you what you want but I like it.
  an explanation of the difference between transfinite induction and regular induction. I.e. IIRC, there needs to be defined a whole new kind of induction in order to do transfinite induction vs the base induction used to define things a little past N. Why is another mechanism necessary?
  As gjm says, ordinary induction is a special case of transfinite induction so it’s not exactly a whole new kind of induction, just a (pretty mild, all things considered) generalization.
  - edanm 1 Mar 2018 20:27 UTC
    3 points
    Parent
    “Geometric” intuition is basically the way that the 3Blue1Brown YouTube channel would explain things. I’m not sure if you’re aware of it, but their “Essence of Linear Algebra” goes through the broad high-level concepts of linear algebra and explains them, with a very visual/geometric intuition for things like basis change, inverses, determinants, etc.
    Unfortunately, they never covered transpose :)
    Also, I’ll take a look at your blog post, thanks!
- TurnTrout 28 Feb 2018 14:35 UTC
  4 points
  Parent
  I second this; I too would appreciate a Qiaochu’s-eye view. Also, are there hidden dependencies in the MIRI reading list? I feel like some subjects would be most profitably studied before or after others—category theory is mentioned as helping you understand the underlying structures of mathematics and generalize your understanding—so does that mean read early, or read near completion of the list?
  - Qiaochu_Yuan 28 Feb 2018 20:29 UTC
    8 points
    Parent
    As Gram Stone says, most people think category-theory-first is a bad idea. I agree that it won’t suit most people. My recommendation is to learn category theory simultaneously with everything else, and also to learn as much of it as seems helpful for understanding whatever else you’re learning, and no more; this is what I did. For example, I learned about adjoint functors as I was doing enough abstract algebra to run into interesting examples of adjoint functors, such as the induction and restriction functors on group representations. I never went through a category theory textbook (in general I mostly learn from blog posts, Wikipedia, the nLab, etc.) and so never learned things that didn’t seem useful for something else.
    In general I’m a big fan of learning many fields simultaneously, so it’s easier to see connections between them. The relevant dependencies don’t parse into subjects for me; they’re smaller conceptual chunks like “understand, in any of the places where it appears, the concept of currying, or else you won’t understand a ton of things like how to pass between the two standard description of group actions, or why the double dual of a finite-dimensional vector space is naturally the same vector space again.” (It took me a few hours of frustrated thinking to really grok this and once I did I was able to use it smoothly everywhere it appeared forever.)
    I like Conceptual Mathematics a lot.
  - Gram Stone 28 Feb 2018 15:08 UTC
    5 points
    Parent
    Re: category-theory-first approaches; I find that most people think this is a bad idea because most people need to see concrete examples before category theory clicks for them, otherwise it’s too general, but a few people feel differently and have published introductory textbooks on category theory that assume less background knowledge than the standard textbooks. If you’re interested, you could try Awodey’s Category Theory (free), or Lawvere’s Conceptual Mathematics. After getting some more basics under your belt, you could give either of those a shot, just in case you’re the sort of person who learns faster by seeing the general rule before the concrete examples. (These people exist, but I think it’s easy to fall into the trap of wishing that you were that sort of person and banging your head against the general rule when you really just need to pick up the concrete examples first. One should update if first-principles approaches are not working.)
- Richard_Ngo 28 Feb 2018 19:41 UTC
  2 points
  Parent
  I’d also add to 2: an intuitive explanation of eigenvectors and their properties in general. They seem to pop up everywhere in Linear Algebra and when then do, people gesture towards a set of intuitions about them that I haven’t managed to pick up.
  - Qiaochu_Yuan 28 Feb 2018 20:53 UTC
    8 points
    Parent
    The first secret is that eigenvectors are not really the point; the fundamental concept is eigenspaces, and the point of eigenvectors is to furnish bases for eigenspaces. Eigenspaces are unique, but you have to make choices to pick out a basis of eigenvectors.
    An eigenspace for a linear operator $T : V \to V$ is a subspace $W \subseteq V$ which is $T$ -invariant (meaning that $T W \subseteq W$ ) and on which $T$ acts as multiplication by some scalar $λ$ (the “eigenvalue”). You would love to have such a thing if it existed, because multiplication by a scalar is super simple to understand (for example, it’s super easy to compute powers of $T$ acting on $W$ since you just take powers of $λ$ ). But there’s no reason a priori to expect that such things exist.
    The miracle is that if $V$ is finite-dimensional and we’re working over an algebraically closed field such as $C$ , then $T$ always has a nontrivial eigenspace, and in fact most of the time $V$ is the direct sum of the eigenspaces of $T$ . (In general $V$ is the direct sum of the generalized eigenspaces of $T$ ; this is a slightly weaker version of the existence of Jordan normal form.)
    In other words, most of the time you can split $V$ up into a bunch of directions such that $T$ acts by scaling each of those directions (this intuition is complicated by the fact that we’re working over a field like $C$ but as a first pass this isn’t bad). I want to underline that this is a miracle; there was no reason a priori that linear operators on finite-dimensional vector spaces should behave this simply, and in the infinite-dimensional case things are much more complicated.
    I don’t know if that addressed any of your confusions; as I said above, more specific questions are better.
- gjm 28 Feb 2018 15:09 UTC
  2 points
  Parent
  On 3: when doing a mathematics degree, you will often get a fair bit of choice of what subjects to study in depth. So any list of subjects-you-need-to-know will need to be somewhat variable—or else end up covering more than most people would do in their actual undergraduate studies. Could you say more about what areas you’re interested in?
  - edanm 1 Mar 2018 20:24 UTC
    4 points
    Parent
    Theoretically, I’m most interested in things related to Data Science/Machine Learning/Deep Learning, as that’s my day job. However, since this is my day job, it’s also the area that I know the most about. So e.g. I’ve studied Linear Algebra, Probability, Statistics etc quiet a bit.
    I’m mostly interested in rounding out my specific knowledge with everything else I need to know to have knowledge equivalent to a well-rounded math major.
    In terms of what personally interests me the most, that’d be logic/foundations-of-mathematics type stuff, e.g. Set Theory.