I think I largely agree with this, but I think it’s also pretty hard to put to practice in training an AI proof system.
Fermat’s theorem is actually a great example. Imagine my goal is to “align” a mathematician to solve FLT. Imagine a bizarre counterfactual where we have access to advanced AI models and computational power but, for some reason, we don’t have algebraic number theory or algebraic geometry—they were just never invented for some reason. If you want something less far-fetched: imagine the abc conjecture today if you’re of a camp that believes the inter-universal Teichmuller theory stuff is not going to work out, or imagine perhaps the Collatz conjecture (very easy to state, very hard to make progress on), if you believe that we might need some truly out-there stuff to resolve it. I’ll say “FLT” as a stand-in for “easy to state problem that everyone agrees is important, but is completely resilient to methods that exist(ed) in it’s day.”
So Fermat just died, we opened his notebook, and read his famous “the margins aren’t wide enough” note. We decide that a major goal of the world’s mathematical enterprise is to regenerate his proof, and we set our anachronistic advanced AI systems to work. Most of them, trained on a corpus of Fermat (and Fermat-like authors) text perform a bunch of weird factoring tricks, equation rewrites, reductions mod n, etc. Fairly standard number theory at the time (I think; I’m not a historian).
We have the benefit in our timeline of knowing that a proof of FLT (at least very strongly seems to) require a rather gigantic castle in the sky to be built first. It just doesn’t look like there’s a road to FLT that doesn’t pass through number rings, elliptic curves, modular forms, cohomology theories, schemes, etc.
Suppose that one of our AI systems (one of the ones that’s allowed to try to be creative) starts drifting out of the factoring/rewrites/reduction mod n realm into doing more pie-in-the-sky stuff: it starts inventing words like “primary decomposition” and “Noetherianity / ascending chain conditions” and produces a massive volume of text on something called “Krull dimension.” Should we punish or reward it? A similar AI goes off and produces stuff about “cohesive rings” and “fusible quasitowers” and “entwineable bundles” and such. Should we punish or reward it? Of course, in reality, they’d be taking baby steps, and we’d be punishing/rewarding far before they can start building towers. Perhaps this means we have even less information about where they’re going. (If the math police were in my office punishing/rewarding me after every line I write down, I’d certainly never get anything done, that’s for sure!)
The kinds of people who invented primary decomposition, and the ascending chain condition, and Krull dimension were “well-aligned.” They cared a lot about problems like FLT. In fact, primary decomposition (more or less) directly arose out of various failed proof attempts of FLT, and Noetherianity/ACC was (largely, initially; disclaimer that I am not a historian) a technical tool to get primary decomposition working.
It’s not so easy to run a test like “does this proof shorten the distance to things we care about” because primary decomposition, while a necessary part of the tower, was still upwards of a century away from a proof. We can’t, from Fermat’s time, look at the statement and “obviously” tell that ’it’s going to make a proof of FLT shorter. The farther you climb up the tower (someone in the corner inventing cohomology theories on schemes) the more strained the link to FLT, but all those links do appear to be ultimately necessary. If FLT is the main thing we care about, we are sort of stuck relying on incredibly vague intuitions from people like Noether and Lasker and Krull (and much later Grothendieck and Tate and Shimura and Wiles; and all the people in between) that the tower they’re building might actually be the “shortest bridge” to the elementary-to-state statements we care about (like FLT), even though (pre-Wiles) they have absolutely no way to prove that. They can in no way absolutely prove to their punish/reward trainers that “primary decomposition” and “sheaf cohomology” and “modular forms” aren’t a mis-aligned waste of time. They can argue how it’s related, but they cannot know for a fact that these things will shorten the proof.
So, how do we foster the kinds of thinking like from the Noethers and Laskers and Krulls, who were building a tower based on vague intuitions like “FLT is important and I think this is a part of understanding what’s going on,” while dis-incentivizing the time-wasters who want to invent a random theory of “cohesive rings” or “entwineable bundles” or whatever? Listed side-by-side, there’s not an obvious sticker that says “this one is important for FLT, and this one is useless.” Plausibly, a sufficiently persuasive mathematician (who is sufficiently motivated to obtain grant money) could spin a tale that could convince us that what they’re doing is the most important thing in the world for (*grabs random scrap of paper from a hat*) Fermat’s Last Theorem. If we can’t trust our ability to resist persuasion by mathematicians hungry for grant money, how do we train them to have motivations “aligned with” building towers that contribute to FLT, instead of building towers to nowhere that trip random reward centers.
Thanks for the pointer to davinci-003! I am certainly not interested in ChatGPT specifically, it just happens to be the case that ChatGPT is the easiest to pop open up and start using for a non-expert (like myself). It was fun enough to tinker with, so I look forward to checking out davinci.
I had not heard of GPT-f—appreciate the link to the paper! I’ve seen some lean demonstrations, and they were pretty cool. It did well with some very elementary topology problems (reasoning around the definition of “continuous”), and struggled with analysis in interesting ways. There was some particular theorem (maybe the extreme value theorem? I could be forgetting) that it was able to get in dimension 1 without too much trouble, but that it struggled hard with in dimension 2, in a way that a human would not really struggle with (the proof of this particular in dimension 2, or dimension n for that matter, is very nearly identical at first reading). Breaking down it’s failure, the demonstrators argued pretty convincingly that perhaps there’s actually just honestly more going on in the dimension 2 case than the dimension 1 case that a human prover might be glossing over. The machine can’t say “this set is compact, so the extension of the previous argument is completely analogous/obvious/trivial,” it has to actually go through the details of proving compactness in detail, carrying out the extension, etc.. The details may not be deep (in this particular case), but they are different. I think it would be really cool to see lean-like systems that are capable of making possibly illegitimate logical leaps (canonically: “this is trivial,” or “we leave this as an exercise”) to escape some nasty, tangential sub-sub-argument they’re stuck on, even if they diverge from that point and ultimately get the conclusion wrong (as you say in bullet 3: just run 500 more randomized instances, and see where they go, or inspect the results manually), just to see where they might go next. Regarding code-writing capabilities, it doesn’t seem like there’s anything in principle special about python, and it would be cool to start seeing variants pop up with access to larger mathematics languages (think sage, matlab, mathematica, magma, etc.) that might be more directly useful to automatic proof-writing.
I partially agree and partially disagree on point 3.
Partial agreement: For an application like “I am trying to prove [Theorem X] that I have good reason to believe is completely provable with nothing more than the toolbox already in this paper ([Lemma 1], [Lemma 2], [Lemma 3], …, [Lemma n]) plus standard results in the literature, then I agree, reward it for directions that seem productive and seem likely to shorten the proof, and penalize it for irrelevant tangents. Build a corpus of training theorems that are similar to the things we care about, where we know of the existence of a proof using tools that we know the system is aware of (e.g., don’t train a system on only analysis texts and give a problem that you know to require Galois theory). Some models will produce proofs nearly as short or shorter than the proofs we know about, others will go off on tangents and build castles to nowhere, never really getting around to the point. Clearly we prefer the former. This is likely to be the main near-term application of these systems.
Partial disagreement: For a longer term project, where the system is allowed to make more open ended moves than “help me finish the proof of [Theorem 3.2.6],” I don’t quite agree that it is entirely obvious how to “align” the system to do things that are productive. I expanded a bit on this in another comment in this thread, but imagine a scenario where you have a long (which may mean indefinitely long: maybe 5 years, maybe 20, maybe 200) term project like “prove Fermat’s last theorem” (in a counterfactual where we don’t already know it) or “prove the abc conjecture” or “prove the Collatz conjecture” or something. Some easy to state theorems that are easy for any system competent in basic algebra and number theory manipulations to get started on, easy to generate and test examples, clearly interesting to all parties involves, and of very high interest. We cannot evaluate a system based on whether it produces shorter proofs of FLT in a hypothetical scenario where we don’t already know what the proof of FLT will involve. In the short term, the relative value of “productive-seeming” work (mass volumes of direct manipulations on the equation xn+yn=zn using standard methods that existed in Fermat’s time) vs “irrelevant tangents / castles in the sky” (inventing a theory of primary decompositions, modular forms, schemes; 95% of which is clearly irrelevant to FLT, but access to the broad body of theory appears necessary) is not entirely straightforward, since there is no easy way to test an irrelevant tangent on whether it makes the proof shorter or not. It’s very hard to evaluate whether a research direction is “promising,” and an unfortunate truth is that most of our evaluation comes down to trusting in the intuitions of the researcher that “this is important and relevant to the heart and soul of FLT.”
I think it’s a problem that grant application evaluators face. I want to fund the research team that will most strongly contribute to long term progress on FLT (or pick a problem that seems likely to have a 30 year horizon, rather than a 200 year horizon), and both teams are motivated to convince me they deserve the funding. Even for humans, this is not an easy task. Generally speaking, given a sufficiently technical mathematical paper (or cluster of papers in the authorship graph), there may exist no more than 20 people on Earth that really understand the subject enough to honestly evaluate how relevant that paper is to the grant subject: all of those people are friends of the author (and may be motivated to advance the author’s career by helping them secure a big grant, irrespective of the abstract relevance of the research; one author in their network’s success contributes to their own success, via collaborations, citations, etc.), and none of whom are evaluating the grant applications. That is, someone is building a tower / castle in the sky, and the only people who really understand the castle are personal friends of the builder. We are placing a lot of trust in the builder (and their authorship network) when we choose to fund their project, like it or not. So, here the research teams are two (or a thousand) competing parameterizations of some model in a training environment, I may strongly suspect that massive towers will have to be built, but I don’t want to trust my own intuitions about which towers are useful or not. I want to at least leave open the possibility that we (like Fermat’s contemporaries) have just no idea what needs to be done to make progress, and that we should allow “well-aligned” crazy ideas to flourish. I have to (at least partially) trust that the tower-builders genuinely “believe” on some internal level that they’re building in the right direction.
I of course agree entirely that systems that are advanced enough for the above to actually be a problem worth intensely caring about are quite a ways off, but they will come to exist sooner or later, and I don’t think it’s obvious how to get those systems building in productive directions.