Note: real analysis is not on the MIRI reading list (although I think it should be).

Foreword

As a young boy, mathematics captivated me.

In elementary school, I’d happily while away entire weekends working through the next grade’s math book. I was impatient.

In middle school, I’d lazily estimate angles of incidence that would result if I shot lasers from my eyes, tracing their trajectories within the classroom and out down the hallway. I was restless.

In high school, I’d daydream about what would happen to integrals as I twisted functions in my mind. I was curious.

And now, I get to see how it’s all put together. Imagine being fascinated by some thing, continually glimpsing beautiful new facets and sampling exotic flavors, yet being resigned to not truly pursuing this passion. After all, I chose to earn a computer science degree.

Wait.

Analysis I

As in Linear Algebra Done Right, I completed every single exercise in the book—this time, without looking up any solutions (although I did occasionally ask questions on Discord). Instead, I came back to problems if I couldn’t solve them after half an hour of effort.

A sampling of my proofs can be found here.

1: Introduction

2: The Natural Numbers

In which the Peano axioms are introduced, allowing us to define addition and multiplication on the natural numbers ${0, 1, 2, \dots}$ .

3: Set Theory

In which functions and Cartesian products are defined, among other concepts.

Recursive Nesting

How can you apply the axiom of foundation if sets are nested in each other? That is, how can the axiom of foundation “reach into” sets like $A = {B, \dots}$ and $B = {A, \dots}$ ?

Show that if $A$ and $B$ are two sets, then either $A \notin B$ or $B \notin A$ (or both).

Proof. Suppose $A \in B$ and $B \in A$ . By the pairwise axiom, we know that there exists a set $S = {A, B}$ . We see that there does not exist an $S^{'} \in S$ such that $S^{'} \cap S = \emptyset$ . That is, if we choose $A$ , one of its elements is $B$ , which is also an element of $S$ - this violates the axiom of foundation. The same reasoning applies if we choose $B$ . Then $\neg (A \in B \land B \in A)$ , so either $A$ or $B$ (or both) is not an element of the other.

4: Integers and Rationals

In which the titular sets are constructed, allowing the exploration of absolute value, exponentiation, and the incompleteness (colloquially, the “gaps”) of the rationals.

Readers newer to mathematics may find it interesting that even though there are (countably) infinitely many rational numbers between any two distinct rationals, the rationals still contain gaps.

5: The Real Numbers

In which Cauchy sequences allow us to formally construct the reals. $^{1}$

6: Limits of Sequences

In which we meet convergence and its lovely limit laws, extend the reals to cover infinities, experience the delightfully-counterintuitive $limsup$ and $liminf$ , and complete our definition of real exponentiation.

Upper-Bounded Monotonic Sequence Convergence

I tried to come up with a clever title here—I really did. Apparently even my punmaking abilities are bounded.

Suppose you have a monotonically increasing sequence $(x_{n})_{n = 0}^{\infty}$ with an upper bound $M \in R$ . Then the sequence converges; this also applies to lower-bounded monotonic decreasing sequences.

Weird, right? I mean, even though the sequence monotonically increases and there’s an upper bound, there are still uncountably infinitely many “places” the sequence can “choose” to go. So what gives?

Proof. Let $L \in [x_{0}, M]$ and $ϵ > 0$ . Suppose that the sequence is not eventually $ϵ$ -close to $L$ . Let $K \in N$ be such that for all $k \geq K$ , either $L + ϵ < x_{k} \leq M$ or $x_{k} < L - ϵ \leq M$ ; we know that $K$ exists because the sequence is monotone increasing. By the Archimedean principle, there exists some $N \in N$ such that $N ϵ > M - x_{0}$ .

Since the sequence is monotone increasing, by repeating the above argument $N$ times in the first case, we have that $M < L + N ϵ < x_{k} \leq M$ , which is contradictory. By repeating the argument $N$ times in the second case, we have $x_{k} < L - N ϵ < x_{0}$ , which contradicts the fact that the sequence is monotone increasing. Then for any $ϵ > 0$ , the sequence must be eventually $ϵ$ -close to some $L \in [x_{0}, M]$ . Intuitively, for any given $ϵ > 0$ , the sequence can only “escape” a limit a finite number of times before it runs out of room and has to be $ϵ$ -close.

Next, we show that the $L_{ϵ}$ ’s form a Cauchy sequence. Let $ϵ_{3} > 0$ , and set $ϵ_{1}, ϵ_{2}$ such that $0 < ϵ_{1}, ϵ_{2} \leq \frac{ϵ_{3}}{2}$ . $(x_{n})$ is eventually $ϵ_{1}$ -close to $L_{ϵ_{1}}$ , so there exists a $K_{1} \in N$ such that for all $k \geq K_{1}$ we have $| x_{k} - L_{ϵ_{1}} | < ϵ_{1}$ . Similar arguments hold for $ϵ_{2}$ . Set $K = max (K_{1}, K_{2}) + 1$ , now $| L_{ϵ_{1}} - L_{ϵ_{2}} | \leq | x_{K} - L_{ϵ_{1}} | + | x_{K} - L_{ϵ_{2}} | < ϵ_{1} + ϵ_{2} \leq ϵ_{3}$ . But $ϵ_{3}$ is arbitrary, so we can easily see that the sequence $(L_{\frac{1}{n}})_{n = 1}^{\infty}$ is Cauchy.

As the real numbers are complete, this sequence converges to some $L_{\infty} \in R$ . Since the main sequence is eventually $\frac{1}{n}$ -close to $L_{\frac{1}{n}}$ , and $L_{\frac{1}{n}}$ converges to $L_{\infty}$ , by the triangle inequality we have that the main sequence converges to $L_{\infty}$ .

7: Series

In which we finally reach concepts from pre-calculus.

Condensation

The Cauchy condensation test says that for a sequence $(a_{n})_{n = 1}^{\infty}$ where $a_{n} \geq 0$ and $a_{n + 1} \leq a_{n}$ for all $n \geq 1$ , the series $\sum_{n = 1}^{\infty} a_{n}$ converges iff the series $\sum_{k = 0}^{\infty} 2^{k} a_{2^{k}}$ converges. Using this result, we have that the harmonic series $\sum_{n = 1}^{\infty} \frac{1}{n}$ diverges; the partial sums $\sum_{n = 1}^{x} \frac{1}{n}$ are given below.

What was initially counterintuitive is that even though ${lim}_{n \to \infty} \frac{1}{n} = 0$ , the series doesn’t converge. The best intuition I’ve come up with is that the harmonic series doesn’t “deplete” its domain quickly enough, so you can get arbitrarily large partial sums.

If you want proofs, here are twenty!

8: Infinite Sets

In which uncountable sets $^{2}$ , the axiom of choice, and ordered sets brighten our lives.

9: Continuous Functions on $R$

In which continuity, the maximum principle, and the intermediate value theorem make their debut.

Lipschitz Continuity $⇎$ Uniform Continuity

If a function $f : X \to R$ ( $X \subseteq R$ ) is Lipschitz-continuous for some Lipschitz constant $M$ , then by definition we have t for every $x, y \in X$ ,

\frac{| f (x) - f (y) |}{| x - y |} \leq M .

The definition of uniform continuity is

For every $ϵ > 0$ , there exists a $δ > 0$ such that for all $x, y \in X$ such that $| x - y | < δ$ , $| f (x) - f (y) | < ϵ$ .

Lipschitz continuity implies uniform continuity (do you see why?), but the converse is not true. I mean, what kind of twisted person would come up with this kind of function?

10: Differentiation of Functions

In which the basic rules of differential calculus are proven.

You know, I actually thought that I wouldn’t have too much to explain in this post—the book went very smoothly up to this point. On the upside, we get to spend even more time together!

Differential Intuitions

Let me simply direct you to this excellent StackExchange answer.

pǝɹᴉʌɐʇᴉʌǝs

We can understand $(f^{- 1})^{'} (x) = \frac{1}{f^{'} (x)}$ by simply thinking about $\frac{r u n}{r i s e} = \frac{1}{\frac{r i s e}{r u n}}$ , which makes sense for the derivative of the inverse!

L’Hôpital’s Rule

Consider $f, g : [a, b] \to R$ differentiable on $(a, b]$ (for real numbers $a < b$ ). Then if $f (a) = g (a) = 0, g^{'} (x) \neq 0$ for $x \in [a, b]$ , and ${lim}_{x \to a; x \in (a, b]} \frac{f^{'} (x)}{g^{'} (x)} = L \in R$ , we have that $g (x) \neq 0$ for $x \in (a, b]$ and ${lim}_{x \to a; x \in (a, b]} \frac{f (x)}{g (x)} = L$ .

As a neat exercise, let’s see how this rule breaks if we violate preconditions:

If $f (a)$ or $g (a) \neq 0$ , then the ratio is “messed up” and not necessarily indicative of the functions’ slopes as $a$ is approached.
If $f$ or $g$ is not differentiable on $(a, b]$ , then perhaps
- No, but really—you would use L’Hôpital’s rule to analytically determine that the limit in question ( ${lim}_{x \to 0} \frac{ln (1 - x) - sin x}{1 - {cos}^{2} x}$ ) does not exist.
If $g^{'} (x) = 0$ for some $x \in [a, b]$ , then we have division by zero (unless $x = a$ , in which case we find more twisted counterexamples which necessitate the closure of this interval).

11: The Riemann Integral

In which partitions and piecewise constant functions help us define the Riemann integral, which later leads to the Riemann-Stieltjes integral and the fundamental theorems of calculus.

Having taken care of the exposition, we arrive at the Rivendell of real analysis, preparing ourselves for the arduous journey to Mt. Lebesgue.

Pointless Integration

There is zero area under a point (or even infinitely many points, such as $N$ ) due to how we define length, which in turn allows us to build from piecewise Riemann integrals to the real deal.

Infinite Partitions?

The upper and lower Riemann integrals can be defined as the infimum and supremum of the upper and lower Riemann sums, respectively. It is important to note that even though that for many functions (such as $f (x) = x$ ), further refinement of the partition always gets you closer to the extremum, the result is not an “infinite” partition (which is not defined according to this construction).

Consider the curried function $g_{f} : [P] \to R$ , which takes a partition and computes its corresponding Riemann sum with respect to the predefined function. Then clearly this function is monotonic with respect to the refinement of the partition; the extremum is not necessarily achieved by any given partition in the refinement sequence, but rather the closest bound on what you can get with any partition.

Riemann-Stieltjes Confusionn

The book doesn’t lay it out cleanly, so I will: the Riemann-Stieltjes integral allows us to use custom length functions to weight different parts of the function differently. I recommend working through a simple case like $α (x) = x^{2}$ in your head: $\int_{1}^{3} x d α$ (how do the piecewise constant Riemann-Stieltjes integrals of majorizing and minorizing functions change as you iteratively refine the coarsest partition possible?).

This is particularly useful in defining expectation in statistics:

\int_{- \infty}^{\infty} x d F_{X} (x) .

The Riemann integral is recovered as the special case where $α (x) = x$ .

Final Verdict

Terence Tao is both an incredible mathematician and writer, and it shows. There simply weren’t many things which confused me, and that says more about his writing than it does about me. The exercises are appropriate and hew closely to each chapter’s content; often, the reader proves key results.

My only complaint is that results are frequently referred to as “from Proposition 3.6.4 and Theorem 3.6.12” many chapters later, forcing the reader to infer the referents or backtrack all of the way. In a sense, this may be helpful—you don’t want to backtrack a billion pages, so you try to fill in the blanks. In another sense, it’s annoying.

In my opinion, Analysis I belongs near the very beginning of the research guide. It’s a wonderful introduction to proof-based mathematics, with a helpful appendix clearing up concepts I had previously only picked up via osmosis. Additionally, I met with a deep learning professor at my university, asking them “if I want to be able to understand and potentially make progress on some of the fundamental issues in machine learning, do I need to know real analysis?”. Their answer: a definitive yes.

Forwards

Next up is Analysis II, which works from metric spaces all the way up to the Lebesgue integral. Additionally, I’m going to start working through Sutton and Barto’s Reinforcement Learning: An Introduction to increase the rigor with which I think about RL and in preparation for my work this summer.

I also think I need to run through some applied Calculus to refresh my reflexes (so I don’t have to rederive every identity that I can’t remember).

Tips

With respect to my gripe about result numbering, writing down both the results and their numbers in your notes may save you some headaches.
Appendix A is extremely useful for those new to proofs.
When reading textbooks, your priors should be towards your being wrong—following this intuition will allow you to unlock new abilities, rather than glossing over your (probable) incorrectness and having it blow up in your face later.

Marginal Attention

In the last pages of CFAR’s participant handbook is an entry on marginal attention. Essentially, each bit of extra attention contributes more than the last. If you’re totally dialed in on a task and get slightly distracted, that is far more disastrous than getting slightly more distracted while your attention is already somewhat unfocused.

I often simply left my phone at home and accompanied my Kindle to an empty classroom. This worked wonderfully; I suspect it more than doubled my hourly learning efficiency.

Proving Myself

Just over three months ago, I wrote:

Proofs remain inordinately difficult for me, although I have noticed a small improvement. To do MIRI-relevant math, proofs will need to become second nature. Depending on how I feel as I progress through my next book (which will likely be a proof-centric linear algebra tome), I’ll start trying different supplemental approaches for improving my proof prowess.

I have resolved that by the completion of my next book review, proofs will be one of my strong suits.

And now, I think that my proofs are getting pretty good. When confronting a problem, I feel a certain mental lightness—a readiness to use my full repertoire of knowledge, to strike true down those paths likely to bring the proof to completion. Although my capacities remain faint shadows of what I hope to achieve within the coming year, my progress has been substantial.

Talk is cheap, and you probably don’t feel like navigating to the selection of proofs I provided, so let me share with you my favorite proof from this book.

The following problem was admittedly confusing at first, but I had an overwhelmingly strong sense that the statement had to be true. I thought again and again about why; once that came to me, I wrote it all at once, and beamed.

Local Extrema are Stationary: let $a < b$ be real numbers, and let $f : (a, b) \to R$ be a function. If $x_{0} \in (a, b)$ , $f$ is differentiable at $x_{0}$ , and $f$ attains either a local maximum or local minimum at $x_{0}$ , then $f^{'} (x_{0}) = 0$ .

Proof. Suppose $f (x_{0})$ is a local maximum; thus, for all $x \in (a, b)$ , $f (x_{0}) \geq f (x)$ . Let $L = f^{'} (x_{0})$ ; we know that $L$ exists and is a real number since $f$ is differentiable at $x_{0}$ . By the trichotomy of real numbers, $L$ is either negative, positive, or zero.

Suppose that $L$ is negative—then consider ${lim}_{x \to x_{0}^{-}; x \in X - {x_{0}}} \frac{f (x) - f (x_{0})}{x - x_{0}}$ (this is permissible as the left and right limits of a convergent limit are equal); we have a sequence $(x_{n})_{n = 1}^{\infty}$ of $x_{n} < x_{0}$ which converges to $x_{0}$ , so every term in $(x_{n} - x_{0})_{n = 1}^{\infty}$ is negative.

If $(f (x_{n}) - f (x_{0}))_{n = 1}^{\infty}$ has no positive terms, then each term in $(\frac{f (x_{n}) - f (x_{0})}{x - x_{0}})_{n = 1}^{\infty}$ must be positive, so $L \geq 0$ , contradicting our assumption that $L < 0$ . Then by the properties of convergent limits, there must be infinitely many $x_{n}$ such that $f (x_{n}) - f (x_{0})$ is positive. Therefore, these $f (x_{n}) > f (x_{0})$ , contradicting the fact that $f (x_{0})$ is a local maximum on $(a, b)$ . Then $L$ cannot be negative.

A similar proof holds for $L > 0$ , so $L = 0$ .

To solve for local minimum $f (x_{0})$ , define $g (x) := - f (x)$ and use the above result on local maximum $g (x_{0})$ .

If you are interested in working with me or others on the task of learning MIRI-relevant math, if you have a burning desire to knock the alignment problem down a peg—I would be more than happy to work with you. Messaging me may also net you an invitation to the MIRIx Discord server.

$^{1}$ Constructing the reals from first principles was profoundly enjoyable. Working through this book gave me a sense of certitude when dealing with math. Numbers are no longer simply familiar friends following familiar rules, but rather objects I know how to construct. It feels great.

$^{2}$ Gather round, gather round—for it is on this blessed morn/day/evening that I recount my youthful dalliances with uncountable infinities!

In my first term of college, I read Gödel, Escher, Bach as part of a wonderful tutorial class. I came across sizes of infinities, and, like many, my mind absolutely refused. However, I wanted to understand what was really going on so intensely that I spent many hours working through the intuitions on my own (not knowing much of anything about formal mathematics). This period of discovery was one of my favorite experiences of my undergraduate career; I can’t tell you how many nights I just sat under the stars, thinking. Eventually, I came to the conclusion that of course there are multiple sizes of infinities.

Now, maybe it would have been faster to just learn the math behind diagonalization or some other method of proof, but I think there was tremendous value in learning to fall in love with the process—to commit yourself fully to the joy of discovery and thought.

I can certainly tell you that I wouldn’t have made it so far so quickly down the research list if this journey didn’t feel like one of the most beautiful things I’ve ever done.

Into the Kiln: Insights from Tao’s ‘Analysis I’