Much of number theory is about trying to understand the interaction between two different structures on the integers (or the natural numbers): the additive structure and the multiplicative structure.

To clarify what I mean, consider the twin prime conjecture, one of the famous open problems in analytic number theory. The conjecture asserts that there are infinitely many twin primes, in other words, primes $p$ such that $p + 2$ is also prime. Being a prime number is a multiplicative property: a natural number $\geq 2$ is defined to be prime if it’s only divisible by itself and $1$ . In contrast, adding two to a number is an additive operation. Therefore the twin prime conjecture is asking a question about how these two structures interact with each other.

A substantial chunk of the open problems in number theory are about trying to get a better understanding of the interaction between these two structures. In this notebook, I’ll be focusing on one of the most powerful conjectures in this area: the $a b c$ conjecture.

Randomness

Before I talk about any specific problems, I’ll take the time to introduce the concept of heuristics based on randomness. These are suggestive probabilistic arguments that “prove” a result under false assumptions that different events are independent when they are really not so.

Let’s see how we can “prove” the twin prime conjecture using such methods. Suppose that we pick a large natural number $N$ and we pick a natural number $n$ from the interval $[N, 2 N]$ uniformly at random. What’s the chance that $n$ and $n + 2$ form a twin prime pair?

Well, we know by the prime number theorem that there are roughly $\sim N / log N$ primes in this interval, so if we say $E_{1}$ is the event of $n$ being prime and $E_{2}$ is the event of $n + 2$ being prime, we’ll have roughly $P (E_{1}) \approx P (E_{2}) \sim 1 / log N$ . We want the probability that both of them are prime, in other words, we want $P (E_{1} \cap E_{2})$ .

So far everything we’ve done is rigorous. Now we’ll introduce the heuristic: we’ll assume that the events $E_{1}$ and $E_{2}$ are independent, which amounts to assuming that in this case the additive and multiplicative structures on the integers are “orthogonal” in some sense. Under this assumption, we can deduce

$P (E_{1} \cap E_{2}) = P (E_{1}) \cdot P (E_{2}) \sim \frac{1}{{log}^{2} (N)}$

In other words, the probability of a natural number chosen uniformly at random from $[N, 2 N]$ being the smaller of a pair of twin primes is some constant times $1 / {log}^{2} (N)$ . That tells us that there are roughly $N / {log}^{2} (N)$ such numbers from $N$ to $2 N$ , and now it’s easy to see that the total number of twin primes will be infinite since this expression grows without bound as $N \to \infty$ .

This argument strongly suggests that the twin prime conjecture should hold: if it didn’t hold, there would have to be a strong anticorrelation between the events of $n$ being prime and $n + 2$ being prime.

On the other hand, it’s important to underline the limitations of this kind of argument. Notice that reasoning along the same lines would also give us a strongly suggestive argument that there ought to be infinitely many triplets of primes: in other words, infinitely many primes $p$ such that both $p + 2$ and $p + 4$ are also prime. This is, however, false. One way to see that it’s false is to note that no matter what $p$ is, at least one of these three numbers must be divisible by $3$ . Therefore the only way they can all be prime is if one of them is actually equal to $3$ , so the only prime triplet is $3, 5, 7$ . Here our heuristic fails because we’ve ran into a situation in which there is some interaction happening between the additive and multiplicative structures on the integers.

The general rule is that if we know a particular bit of structure is there, we can adjust our heuristics to incorporate it and sharpen up our results. If we’re in the dark, however, then heuristic arguments can’t go beyond being suggestive of a result. For an actual proof we have to demonstrate that our heuristic hasn’t left out any “hidden structure”.

The strongest kind of partial evidence we have of results in number theory is often a suggestive heuristic coupled with similar results being proven in other structures that we might think of as analogous. As I’ll show in the next section, both of these are true of the $a b c$ conjecture.

The abc conjecture

The $a b c$ conjecture has a rather peculiar looking statement. It’s about the simplest equation we can think of: $a + b = c$ where $a, b, c$ are all natural numbers. The conjecture states that for any given real number $ε > 0$ there are only finitely many solutions of $a + b = c$ such that $a, b, c$ are pairwise coprime and $c > rad (a b c)^{1 + ε}$ .

Here $rad (n)$ is the radical of the natural number $n$ . It’s defined to be the product of the distinct primes dividing $n$ . For example, $rad (12) = 6$ , $rad (90) = 30$ , et cetera. The “pairwise coprime” condition means that $a$ and $b$ don’t share any prime divisor in common.

Now, let’s try to understand what this statement is all about, since on first sight it appears rather bizarre. The intuitive idea of the statement is that almost every solution of the equation $a + b = c$ with $a, b$ coprime should involve a large “mass” of prime factors spread across all of $a, b, c$ if we weigh each prime factor by its logarithm (equivalently, its number of digits). The condition that $a, b$ are coprime is essential for this, since otherwise we could simply take $2^{n} + 2^{n} = 2^{n + 1}$ for any natural number $n$ and we’d have infinitely many counterexamples to the conjecture. Indeed, in this case $rad (a b c) = rad (2^{3 n + 1}) = 2$ and so all we have to do is make $n$ very big and we’ll be done.

If you’re wondering about the funny $ε$ in the exponent, it does need to be there for the statement to not be trivially false. We’ll see the reason behind this when we look at the heuristics behind why we should expect the conjecture to hold, but for the moment it’s easy to give a counterexample to the conjecture where $ε = 0$ : let $p$ be prime and take $a = 1, b = 2^{p (p - 1)} - 1, c = 2^{p (p - 1)}$ . The radical of the product in this case is clearly $2 rad (b)$ and since $2^{p (p - 1)} - 1$ is always divisible by $p^{2}$ (this is due to Euler’s theorem) we get that $rad (a b c) = 2 rad (b) \leq 2 b / p < 2 c / p$ . By picking $p$ to be a large prime, we can obviously make the ratio of $c$ to $rad (a b c)$ arbitrarily large, so we indeed can’t remove the $ε > 0$ from the exponent of the radical.

Morally, the $a b c$ conjecture is saying that, in some sense, $a$ and $b$ sharing prime factors is the only obstruction to the prime factors spread across $a, b, c$ behaving in a random fashion. I’ll give a heuristic justification for why randomness should imply the conjecture later—for now let’s investigate some of its consequences.

Perhaps the most famous theorem in all of mathematics is Fermat’s last theorem: the claim that $a^{n} + b^{n} = c^{n}$ has no nonzero integer solutions when $n \geq 3$ . It was an extremely difficult theroem to prove, but actually if we assume the $a b c$ conjecture the hard part of the problem becomes trivial. This is because the $a b c$ conjecture says that for any $ε > 0$ all but finitely many primitive solutions of the equation must satisfy

$c^{n} < rad ((a b c)^{n})^{1 + ε} = rad (a b c)^{1 + ε} \leq (a b c)^{1 + ε} \leq c^{3 (1 + ε)}$

and now if we pick some value like $ε = 1 / 6$ then we get a contradiction for any $n \geq 4$ . So the $a b c$ conjecture is sufficient to show that there are at most finitely many primitive counterexamples to Fermat’s last theorem for any $n \geq 4$ . The conjecture doesn’t quite prove that the only solutions are the trivial ones, but it comes very close to doing so.

In fact the inability of the $a b c$ conjecture to get close to proving Fermat’s last theorem in the case $n = 3$ is not surprising: this is because other heuristic methods are also unable to see that the theorem should hold for $n = 3$ . For example, a simple heuristic argument is that if we pick $a, b$ to be uniformly distributed on $[N / 2, N]$ , the probability that $a^{3} + b^{3}$ is a perfect cube should be roughly $\sim 1 / N^{2}$ . Since there are $\sim N^{2}$ pairs $(a, b)$ we can choose, this suggests that we should expect a constant number of counterexamples to Fermat’s last theorem in every such interval $[N / 2, N]$ and we get an infinite expected value for the total number of counterexamples.

It turns out that the structure that is being missed by the heuristic here is that $a^{3} + b^{3} = c^{3}$ is actually a (homogenized) elliptic curve. I won’t go into the details here, but it’s another good illustration of how imprudent use of heuristics can lead one to wrong conclusions.

Now that we’ve seen how the conjecture can be useful, let’s get on with the reasons for expecting that it might be true.

Analogous structures

The $a b c$ conjecture has an easy proof in the case of polynomials over a field of characteristic $0$ . If you don’t know what a field of characteristic $0$ is, in this section I’ll be working with the complex numbers for the sake of concreteness.

The statement has to be modified somewhat when we’re working with polynomials over $C$ . It becomes the following: for all nonconstant polynomials $a (x), b (x), c (x)$ , if $gcd (a, b) = 1$ and $a (x) + b (x) = c (x)$ as polynomials then $deg c < deg rad (a b c)$ .

The proof is also surprisingly easy and uses a derivative trick. The key fact is that for a polynomial $p$ the degree of its radical $deg rad (p)$ counts the number of its distinct zeroes. In this case, the $a b c$ conjecture is saying that the number of zeroes that $a, b, c$ have in total has to be bigger than the degree of $c$ . It turns out that we can also identify zeroes of polynomials which have multiplicity by taking greatest common divisors with their derivative, so we have that $rad (p) = \frac{p}{gcd (p, p^{'})}$ . (Here the equality can be up to some overall constant—this doesn’t matter for our purposes.)

For example, suppose we’re working with $p (X) = X^{2} (X + 1)^{3}$ . If we differentiate this we get $p^{'} (X) = 2 X (X + 1)^{3} + 3 X^{2} (X + 1)^{2} = X (X + 1)^{2} (5 X + 2)$ , and if we now take greatest common divisors we get $gcd (p, p^{'}) = X (X + 1)^{2}$ . We can then compute the radical by

$rad (p) = \frac{p}{gcd (p, p^{'})} = \frac{X^{2} (X + 1)^{3}}{X (X + 1)^{2}} = X (X + 1)$

The reason this alternative way of computing the radical helps us is because the derivative also behaves well with respect to addition: it simply commutes with it. This means in the case of polynomials we have this tool to connect the additive and multiplicative structures with each other and this makes our life a lot easier.

Now for the proof: write the original equation as

$\frac{a}{b} + 1 = \frac{c}{b}$

and multiply through by $b^{2}$ after taking derivatives to obtain

$F = a^{'} b - b^{'} a = c^{'} b - b^{'} c$

Now, $F$ is divisible by all of $gcd (a, a^{'})$ , $gcd (b, b^{'})$ and $gcd (c, c^{'})$ for obvious reasons. Since $a, b, c$ are all pairwise coprime, this means $F$ is actually divisible by their product, and so

$deg F \geq deg gcd (a, a^{'}) + deg gcd (b, b^{'}) + deg gcd (c, c^{'})$ $= deg a + deg b + deg c - deg rad (a b c)$

On the other hand, $deg F = deg a + deg b - 1$ , so it follows that

$deg c + 1 \leq deg rad (a b c)$

which is what we wanted to prove.

Since polynomial rings over fields are analogous to the integers in many ways, we might think that even if we don’t have access to differentiation as a tool to prove results over the integers, a similar result to the one we’ve proved here should still hold. If it didn’t, there would have to be some relevant difference between the two structures which would explain their divergent properties.

Heuristics

The second source of partial evidence for the $a b c$ conjecture comes from the kind of randomness heuristics that I’ve discussed before. I’ll be borrowing heavily from Terence Tao’s excellent post on this subject, you should read it if you want further details about this argument.

The key fact is that if we fix some radical $r$ then we have $N^{o (1)}$ positive integers smaller than $N$ with radical equal to $r$ - here $o (1)$ simply means some function of $N$ which goes to $0$ as $N \to \infty$ . Now, we fix a large natural number $N$ and model sampling a counterexample to the $a b c$ conjecture as follows:

We pick the radicals $r_{a}, r_{b}, r_{c}$ uniformly at random from pairwise coprime squarefree integers in $[1, N]$ subject to $r_{a} r_{b} r_{c} < N^{1 - ε}$ .
We pick $c$ from among the numbers in $[N, 2 N)$ having radical $r_{c}$ uniformly at random.
We pick $a, b$ from the numbers in $[1, 2 N)$ having radical $r_{a}$ and $r_{b}$ respectively, once again uniformly at random.
We check if $a + b = c$ or not. If so, then we’ve found a counterexample.

Now let’s roughly compute the number of counterexamples we get from this process in total. The volume of the region we’re sampling over in the first case turns out to be $\sim N^{1 - ε} {log}^{2} (N)$ , so at the expense of making $ε$ slightly smaller we can throw away the logarithm factor and pretend this is just $\sim N^{1 - ε}$ .

In the second and third steps, thanks to the result I’ve cited above we know that the number of total choices available is $N^{o (1)}$ .

Finally, in the last step we’ve found a counterexample if $a + b = c$ . Once again we introduce the heuristic assumption here: if $a, b$ are coprime then there’s no structural reason for why $a + b$ should be equal to $c$ , so since both $a + b$ and $c$ are about as big as $N$ we can say that their probability of being equal is $\sim 1 / N$ .

This gives us an expected number of counterexamples $\sim N^{1 - ε + o (1)} / N = N^{- ε + o (1)}$ . Now, since this covers all cases where $N \leq c < 2 N$ , we simply take the infinite sum with $N$ ranging over powers of $2$ to find the total number of expected counterexamples to the $a b c$ conjecture. In this case, this ends up being a geometric series with its main term $\sim 2^{- ε k}$ and so the sum is finite.

Since the sum is finite, this tells us that if the heuristic were a good model seeing infinitely many counterexamples would be an event of probability zero. We can therefore say that this heuristic is suggestive of the $a b c$ conjecture being true.

Recent history

So far I’ve only talked about the mathematics behind the conjecture. The recent history is in fact interesting as well.

In 2012, Shinichi Mochizuki claimed that he had proven the conjecture using methods (named inter-universal Teichmuller theory, abbreviated as IUT) that he had developed in a series of relatively obscure papers over the years. Since nobody was familiar with his body of work at the time, it proved to be impossible for independent mathematicians to verify or disprove his claim, and to this day it’s an open question whether Mochizuki’s proof is actually valid or not.

The most recent attempt to resolve the issue was by Peter Scholze and Jakob Stix, who traveled to Kyoto in 2018 to meet Mochizuki in person in an attempt to understand his proof better. They came back declaring that Mochizuki’s proof had a fundamental flaw, while Mochizuki accused them of misunderstanding his work. In the end, I believe the meeting changed the minds of few people about whether the proof is correct or not.

After a long delay, Mochizuki’s proof was submitted to Publications of the Research Institute for Mathematical Sciences (abbreviated as RIMS), a journal of which Mochizuki was the chief editor. Mochizuki recused himself from reviewing the paper, but the paper was accepted and published in 2021. It is unusual for this to happen when substantial doubts remain about the validity of the argument presented in a mathematics paper. In the meantime, Mochizuki has continued to publish papers about IUT and his body of work on the subject continues to grow larger with every passing year.

I would have liked to go into Mochizuki’s attempted proof in more detail, but there are probably only a few people in the world who are qualified to speak about this subject and I’m not one of them. I’ll only say that there are domain experts on both sides of the correctness dispute and the issue has by no means been resolved. Personally I believe with some confidence that Mochizuki’s proof isn’t correct, but this is based more on outside view considerations than any understanding of the mathematics involved in the proof.

Forecasts

I think there are two generic but important questions to ask about the conjecture: is it true, and when will it be settled?

For the truth of the conjecture, I can say that the above arguments are quite persuasive to me and I think I’m about 95% confident that the conjecture is true. The reason my confidence is so high is because there’s evidence from two directions: both from analogous structures and from heuristic arguments.

As for when it will be settled, this seems less clear to me. I think there is a non-negligible chance that Mochizuki’s proof is actually correct—I would put this at around 10%. If it is not correct, then while the conjecture itself is not that old, the base rate for problems of this difficulty being settled seems low to me—I think the conjecture is of comparable difficulty to the Riemann hypothesis, and the base rate of Millennium Prize problems being settled using a Jeffreys prior is ~ 1%/year. I’ll be more optimistic than this, partly due to considerations about potential growth acceleration in the next century, and put the conditional median somewhere around 2070.

Finally, there is a question that’s peculiar to the abc conjecture, and that’s about Mochizuki’s claimed proof. So far Mochizuki has been adamant that his proof is correct and people only think it’s not because of their misunderstandings. An important question is therefore whether Mochizuki himself will at some point no longer endorse his proof in the form that it was published in RIMS. I think the likelihood of this is rather low but find it hard to go below 10%.

Conclusion

The abc conjecture is uniquely interesting because it’s both one of the most important open problems in number theory and it’s the only case I know of in recent history in which a dispute about the correctness of a proof to such a high profile problem has persisted for so long.

It’s possible that the controversy will only truly be resolved once computer formalization and verification of proofs advances to such a degree that most mathematics proofs are verified by computers. We’re currently far from this state of affairs, however, and the dispute is likely to go unresolved for the foreseeable future.