Epistemic status: Unorganized thoughts of mine generated after reading this book—not a concerted attempt to convey any single particular mathematical insight.

Set theory is the mathematical study of collections. These collections—sets—can be either the collection of nothing (the empty set), or collections… of other collections. Countries, for example, are sets of people, making the United Nations a set of sets of people.

Somewhat surprisingly, just about all the rest of mathematics can be framed as the study of sets (of sets of sets of sets… ). Knowing some set theory lets you translate the myriad objects and claims that you encounter elsewhere in the various areas of math into more set theory, putting math on one common intellectual foundation.

Contents and Notes

1. The Axiom of Extension

2. The Axiom of Specification

All the basic principles of set theory, except only the axiom of extension, are designed to make new sets out of old ones (p. 4).

Reading this book, I picked up the idea that sets with arbitrary properties aren’t simply assumed to exist in math. As Halmos puts it,

It is impossible, especially in mathematics, to get something for nothing. To specify a set, it is not enough to pronounce some magic words (which may form a sentence such as “ $x \notin x$ ”); it is necessary also to have at hand a set to whose elements the magic words apply (p. 7).

Another related “correctly orienting towards mathematics” maxim along these lines is: “assume nothing, prove nothing.” You don’t do math without any premises, because nothing interesting follows from no premises at all. Nor do you freely assume arbitrary premises and see what follows deductively from them. Sets in set theory are not generally assumed to exist—with the sole exception of the axiom of infinity, our existential starting point. Sets in set theory are instead constructed in steps, from the axiom of infinity, using the existential conditionals of the remaining axioms of set theory.

3. Unordered Pairs

4. Unions and Intersections

5. Complements and Powers

6. Ordered Pairs

The ordered pair of $a$ and $b$ , with the first coordinate $a$ and second coordinate $b$ , is the set $(a, b)$ defined by
$(a, b) = {{a}, {a, b}}$
(p. 23).

Sets do not natively track the ordering of their elements: ${x, y, z} = {z, y, x}$ . We can introduce the notion of ordering of elements, though, in the above fashion.

You can begin to see from the above treatment how something like a shape in the Cartesian plane can be formalized as an unwieldly set.

7. Relations

8. Functions

Functions can be understood as sets too. Think of a function as a database (potentially infinitely large) of ordered pairs, with $a$ representing elements in its domain and $b$ elements in its range. $a$ and $b$ can be anything, so so long as $a$ and $b$ themselves can be formalized as sets so can the functions defined on them.

Since this is not the appropriate occasion from lengthy heuristic preliminaries, we shall proceed directly to the definition, even at risk of temporarily shocking or worrying some readers. Here it is: we define $0$ , $1$ , and $2$ by writing
$0 = \emptyset, 1 = {\emptyset}, and 2 = {\emptyset, {\emptyset}}$
In other words, $0$ is empty,^[1] $1$ is the singleton ${0}$ , and $2$ is the pair ${0, 1}$ . Observe that there is some method in this apparent madness; the number of elements in the sets $0$ , $1$ , or $2$ … is, respectively, zero, one, or two (pp. 32-3).

Thus, natural numbers are constructable as sets, and so are straightforward functions from the naturals to the naturals.

I wish the terms injection, surjection, and bijection made an appearance in the book, in place of one-to-one, onto, and one-to-one correspondence, respectively. I tripped over the latter terms a bit (“Wait a second… was that supposed to be an injection or a bijection?”).

9. Families

Suppose, for instance, that $x$ is a function from a set $I$ to a set $X$ . (The very choice of letters indicates that something strange is afoot.) An element of the domain $I$ is called an index, $I$ is called the index set, the range of the function is called an indexed set, the function itself is called a family, and the value of the function $x$ at an index $i$ , called a term of the family, is denoted by $x_{i}$ . (This terminology is not absolutely established, but it is one of the standard choices among related slight variants; in the sequel it and it alone will be used.) An unacceptable but generally accepted way of communicating the notation and indicating the emphasis is to speak of a family ${x_{i}}$ in $X$ , or of a family ${x_{i}}$ of whatever the elements of $X$ may be; when necessary, the index set $I$ is indicated by some such parenthetical expression such as $(i \in I)$ (p. 34).

The above “unacceptable” notation for a family did indeed trip me up a few times. It’s easily the worst piece of notation in the book. (You’d guess that a symbol like ${A_{i}}$ would equal the set containing only $A_{i}$ , but this is not at all the case!)

10. Inverses and Composites

This chapter got me to actually think somewhat carefully about associativity:

Let $f$ be a function from $X$ to $Y$ , and $f^{- 1}$ its inverse.

The algebraic behavior of $f^{- 1}$ is unexceptionable. If ${B_{i}}$ is a family of subsets of $Y$ , then
$f^{- 1} (⋃ i B_{i}) = ⋃ i f^{- 1} (B_{i})$ $f^{- 1} (⋂ i B_{i}) = ⋂ i f^{- 1} (B_{i})$
(p. 39).

Associativity seemed unobjectionable and uninteresting in high-school algebra. What was missing then, I think, was an appreciation of how so many functions irreversibly throw away information about the object in question. Associativity means something like “no information about the object we’re discussing is discarded by these functions, so you may evaluate the functions in any order and always come to the same terminal conclusion.”

If you want the set of all people on the planet, it doesn’t matter whether you pool all the countries first into the UN, then count the UN’s population, or you first count every country’s individual population, then sum those population counts. Neither approach “changes the subject,” so you get to the same figure in the end.

11. Numbers

The successor $x^{+}$ of a set $x$ is the set of all the elements of $x$ together with $x$ itself:

x^{+} = x \cup {x} .

From what has been said so far it does not follow that the construction of successors can be carried out ad infinitum within one and the same set. What we need is a new set-theoretic principle.
Axiom of infinity. There exists a set containing $0$ and containing the successor of each of its elements.
…
We shall say, temporarily, that a set $A$ is a successor set if $0 \in A$ and if $x^{+} \in A$ whenever $x \in A$ . In this language the axiom of infinity simply says that there exists a successor set $A$ . Since the intersection of every (non-empty) family of successor sets is a successor set itself (proof?), the intersection of all the successor sets included in $A$ is a successor set $ω$ . The set $ω$ is a subset of every successor set. If, indeed, $B$ is an arbitrary successor set, then so is $A \cap B$ . Since $A \cap B \subset A$ , the set $A \cap B$ is one of the sets that entered into the definition of $ω$ ; it follows that $ω \subset A \cap B$ , and, consequently, that $ω \subset B$ . The minimality property so established uniquely characterizes $ω$ ; the axiom of extension guarantees that there can be only one successor set that is included in every other successor set. A natural number is, by definition, an element of the minimal successor set $ω$ . This definition of natural numbers is the rigorous counterpart of the intuitive description according to which they consist of $0$ , $1$ , $2$ , $3$ , “and so on” (p. 44).

Incidentally, there’s a lot of that pattern of construction in this book. First, establish that there exists some set with some property. Then, use the axiom of specification (or some other means) to pinpoint just its subset with the desired property (excluding potential misc. elements).

12. The Peano Axioms

13. Arithmetic

14. Order

Remember, a partial order on a set $X$ is a reflexive, transitive, and antisymmetric relation in $X$ , while an equivalence relation on $X$ is a reflexive, transitive, and symmetric relation in X.

15. The Axiom of Choice

Starting here, we turn our studies to infinite set theory.

We have seen that if a collection of sets we are choosing from is finite, then the possibility of simultaneous choice is an easy consequence of what we knew before the axiom of choice was even stated; the role of the axiom is to guarantee that possibility in infinite cases...
Every infinite set has a subset equivalent^[2] to $ω$ … A set is infinite if and only if it is equivalent to a proper subset of itself (pp. 60-1).

16. Zorn’s Lemma

17. Well Ordering

18. Transfinite Recursion

19. Ordinal Numbers

The chief application of the axiom of substitution is in extending the process of counting far beyond the natural numbers. From the present point of view, the crucial property of a natural number is that it is a well ordered set^[3] such that the initial segment^[4] determined by each element is equal to that element… An ordinal number is defined as a well ordered set $α$ such that $s (ξ) = ξ$ for all $ξ$ in $α$ ; here $s (ξ)$ is, as before, the initial segment ${η \in α : η < ξ}$ .
An example of an ordinal that is not a natural number is the set $ω$ consisting of all the natural numbers (p. 75).

Ordinal numbers are the rigorous take on the elementary-school game of counting higher than infinity.

Imagine $(N, <)$ : all the naturals, ordered by $<$ . Now imagine $(N, R)$ : all the natural numbers, ordered by $<$ except for $0$ , which is just defined to be higher in the relation $R$ than any other natural. You can see that there’s a similarity^[5] between the naturals beginning at $0$ in $(N, <)$ and the naturals beginning at $1$ in the second ordering. (There exist infinitely many bijections from $N$ to $N$ , of course, but that’s not enough to get a similarity here—we’re specifically after a monotone bijection.) Since $0 \in (N, R)$ is left out of the similarity, the ordinal number similar to $(N, R)$ is larger than the ordinal number similar to $(N, <)$ .^[6]

In ordinal numbers, we express this

ω < ω + 1 < ω + 2 < \dots < ω + ω = ω 2 < ω 2 + 1 < \dots < ω 3 < ω ω = ω^{2} < \dots

(Please note that addition here isn’t quite addition in the familiar sense: $1 + ω = ω < ω + 1$ .)

20. Sets of Ordinal Numbers

21. Ordinal Arithmetic

22. The Schröder–Bernstein Theorem

Here’s the most beautiful proof in the book:

If X and Y are sets such that X is equivalent to a subset of Y, we shall write
$X ≾ Y .$
…say that Y dominates X...
Schröder–Bernstein Theorem. If $X ≾ Y$ and $Y ≾ X$ , then $X \sim Y$ .^[7]
…
Proof. Let $f$ be a one-to-one mapping from $X$ into $Y$ and let $g$ be a one-to-one mapping from $Y$ into $X$ ; the problem is to construct a one-to-one correspondence between $X$ and $Y$ . It is convenient to assume that the sets $X$ and $Y$ have no elements in common; if that is not true, we can so easily make it true that the added assumption involves no loss of generality.^[8]
We shall say that an element $x$ in $X$ is the parent of the element $f (x)$ in $Y$ and, similarly, that an element $y$ in $Y$ is the parent of $g (y)$ in $X$ . Each element $x$ of $X$ has an infinite sequence of descendants, namely, $f (x)$ , $g (f (x))$ , $f (g (f (x)))$ , etc., and similarly, the descendants of an element $y$ of $Y$ are $g (y)$ , $f (g (y))$ , $g (f (g (y)))$ , etc. This definition implies that each term in the sequence is a descendant of all the preceding terms; we shall also say that each term in the sequence is an ancestor of all following terms.
For each element (in either $X$ or $Y$ ) one of three things must happen. If we keep tracing the ancestry of the element back as far as possible, then either we ultimately come to an element of $X$ that has no parent (these orphans are exactly the elements of $X - g (Y)$ ), or we ultimately come to an element of $Y$ that has no parent $(Y - f (X)$ ), or the lineage regresses ad infinitum. Let $X_{X}$ be the set of those elements of $X$ that originate in $X$ (i.e., $X_{X}$ consists of the elements of $X - g (Y)$ together with all their descendants in $X$ ), let $X_{Y}$ be the set of those elements of $X$ that originate in $Y$ (i.e., $X_{Y}$ consists of all the descendants in $X$ of the elements of $Y - f (X)$ ), and let $X_{\infty}$ be the set of those elements of $X$ that have no parentless ancestor. Partition $Y$ similarly into the three sets $Y_{X}$ , $Y_{Y}$ , and $Y_{\infty}$ .
If $x \in X_{X}$ , then $f (x) \in Y_{X}$ , and, in fact, the restriction of $f$ to $X_{X}$ is a one-to-one correspondence between $X_{X}$ and $Y_{X}$ . If $x \in X_{Y}$ , then $x$ belongs to the domain of the inverse function $g^{- 1}$ and $g^{- 1} (x) \in Y_{Y}$ ; in fact the restriction of $g^{- 1}$ to $X_{Y}$ is a one-to-one correspondence between $X_{Y}$ and $Y_{Y}$ . If, finally, $x \in X_{\infty}$ , then $f (x) \in Y_{\infty}$ , and the restriction of $f$ to $X_{\infty}$ is a one-to-one correspondence between $X_{\infty}$ and $Y_{\infty}$ ; alternatively, if $x \in X_{\infty}$ , then $g^{- 1} (x) \in Y_{\infty}$ , and the restriction of $g^{- 1}$ to $X_{\infty}$ is a one-to-one correspondence between $X_{\infty}$ and $Y_{\infty}$ . By combining these three one-to-one correspondences, we obtain a one-to-one correspondence between $X$ and $Y$ (pp. 88-9).

For whatever subjective reason, I really aesthetically enjoy the exhaustion-by-sequence-tracing character of this proof!

23. Countable Sets

24. Cardinal Arithmetic

25. Cardinal Numbers

Cardinals are least ordinals equivalent to some set $X$ .

Note the connection to the discussion of bijections and similarities above, in §19: $(N, <)$ and $(N, R)$ could be equivalent to the same cardinal number but similar to distinct ordinal numbers.

^
The symbol $\emptyset$ is defined to mean the empty set, ${}$ .
^
In the sense of there existing an equivalence relation in that set.
^
A well order on a set $X$ is (1) a partial order that (2) connects any two elements in $X$ (is total) and (3) has a least element for each subset $S \subset X$ .
Thus, partial orders are less demanding than total orders are less demanding than well orders.
^
An initial segment $s (a)$ of a partially ordered set $X$ is the subset of $X$
${x \in X : x < a}, a \in X .$
^
A similarity is a bijection $f (x)$ between sets $X$ and $Y$ such that when $a, b \in X$ , then
$f (a) \leq f (b) if and only if a \leq b .$
Note that the former expression is evaluated in $Y$ and the latter in $X$ .
^
Similarities between two well-ordered sets are unique (p. 72).
^
Two sets $X$ and $Y$ are equivalent, $X \sim Y$ , when there exists a bijection between $X$ and $Y$ . If $X$ and $Y$ are subsets of $Z$ , then equivalence in this sense is an equivalence relation in the power set $P (Z)$ (p. 52).
^
You can easily make overlapping sets disjoint by creating a bijection between each element $x \in X, y \in Y$ and $(x, 0) \in X \times {0}, (y, 1) \in Y \times {1}$ , respectively.