I think I largely agree with this, but I think it’s also pretty hard to put to practice in training an AI proof system.
Fermat’s theorem is actually a great example. Imagine my goal is to “align” a mathematician to solve FLT. Imagine a bizarre counterfactual where we have access to advanced AI models and computational power but, for some reason, we don’t have algebraic number theory or algebraic geometry—they were just never invented for some reason. If you want something less far-fetched: imagine the abc conjecture today if you’re of a camp that believes the inter-universal Teichmuller theory stuff is not going to work out, or imagine perhaps the Collatz conjecture (very easy to state, very hard to make progress on), if you believe that we might need some truly out-there stuff to resolve it. I’ll say “FLT” as a stand-in for “easy to state problem that everyone agrees is important, but is completely resilient to methods that exist(ed) in it’s day.”
So Fermat just died, we opened his notebook, and read his famous “the margins aren’t wide enough” note. We decide that a major goal of the world’s mathematical enterprise is to regenerate his proof, and we set our anachronistic advanced AI systems to work. Most of them, trained on a corpus of Fermat (and Fermat-like authors) text perform a bunch of weird factoring tricks, equation rewrites, reductions mod n, etc. Fairly standard number theory at the time (I think; I’m not a historian).
We have the benefit in our timeline of knowing that a proof of FLT (at least very strongly seems to) require a rather gigantic castle in the sky to be built first. It just doesn’t look like there’s a road to FLT that doesn’t pass through number rings, elliptic curves, modular forms, cohomology theories, schemes, etc.
Suppose that one of our AI systems (one of the ones that’s allowed to try to be creative) starts drifting out of the factoring/rewrites/reduction mod n realm into doing more pie-in-the-sky stuff: it starts inventing words like “primary decomposition” and “Noetherianity / ascending chain conditions” and produces a massive volume of text on something called “Krull dimension.” Should we punish or reward it? A similar AI goes off and produces stuff about “cohesive rings” and “fusible quasitowers” and “entwineable bundles” and such. Should we punish or reward it? Of course, in reality, they’d be taking baby steps, and we’d be punishing/rewarding far before they can start building towers. Perhaps this means we have even less information about where they’re going. (If the math police were in my office punishing/rewarding me after every line I write down, I’d certainly never get anything done, that’s for sure!)
The kinds of people who invented primary decomposition, and the ascending chain condition, and Krull dimension were “well-aligned.” They cared a lot about problems like FLT. In fact, primary decomposition (more or less) directly arose out of various failed proof attempts of FLT, and Noetherianity/ACC was (largely, initially; disclaimer that I am not a historian) a technical tool to get primary decomposition working.
It’s not so easy to run a test like “does this proof shorten the distance to things we care about” because primary decomposition, while a necessary part of the tower, was still upwards of a century away from a proof. We can’t, from Fermat’s time, look at the statement and “obviously” tell that ’it’s going to make a proof of FLT shorter. The farther you climb up the tower (someone in the corner inventing cohomology theories on schemes) the more strained the link to FLT, but all those links do appear to be ultimately necessary. If FLT is the main thing we care about, we are sort of stuck relying on incredibly vague intuitions from people like Noether and Lasker and Krull (and much later Grothendieck and Tate and Shimura and Wiles; and all the people in between) that the tower they’re building might actually be the “shortest bridge” to the elementary-to-state statements we care about (like FLT), even though (pre-Wiles) they have absolutely no way to prove that. They can in no way absolutely prove to their punish/reward trainers that “primary decomposition” and “sheaf cohomology” and “modular forms” aren’t a mis-aligned waste of time. They can argue how it’s related, but they cannot know for a fact that these things will shorten the proof.
So, how do we foster the kinds of thinking like from the Noethers and Laskers and Krulls, who were building a tower based on vague intuitions like “FLT is important and I think this is a part of understanding what’s going on,” while dis-incentivizing the time-wasters who want to invent a random theory of “cohesive rings” or “entwineable bundles” or whatever? Listed side-by-side, there’s not an obvious sticker that says “this one is important for FLT, and this one is useless.” Plausibly, a sufficiently persuasive mathematician (who is sufficiently motivated to obtain grant money) could spin a tale that could convince us that what they’re doing is the most important thing in the world for (*grabs random scrap of paper from a hat*) Fermat’s Last Theorem. If we can’t trust our ability to resist persuasion by mathematicians hungry for grant money, how do we train them to have motivations “aligned with” building towers that contribute to FLT, instead of building towers to nowhere that trip random reward centers.
I think I largely agree with this, but I think it’s also pretty hard to put to practice in training an AI proof system.
Fermat’s theorem is actually a great example. Imagine my goal is to “align” a mathematician to solve FLT. Imagine a bizarre counterfactual where we have access to advanced AI models and computational power but, for some reason, we don’t have algebraic number theory or algebraic geometry—they were just never invented for some reason. If you want something less far-fetched: imagine the abc conjecture today if you’re of a camp that believes the inter-universal Teichmuller theory stuff is not going to work out, or imagine perhaps the Collatz conjecture (very easy to state, very hard to make progress on), if you believe that we might need some truly out-there stuff to resolve it. I’ll say “FLT” as a stand-in for “easy to state problem that everyone agrees is important, but is completely resilient to methods that exist(ed) in it’s day.”
So Fermat just died, we opened his notebook, and read his famous “the margins aren’t wide enough” note. We decide that a major goal of the world’s mathematical enterprise is to regenerate his proof, and we set our anachronistic advanced AI systems to work. Most of them, trained on a corpus of Fermat (and Fermat-like authors) text perform a bunch of weird factoring tricks, equation rewrites, reductions mod n, etc. Fairly standard number theory at the time (I think; I’m not a historian).
We have the benefit in our timeline of knowing that a proof of FLT (at least very strongly seems to) require a rather gigantic castle in the sky to be built first. It just doesn’t look like there’s a road to FLT that doesn’t pass through number rings, elliptic curves, modular forms, cohomology theories, schemes, etc.
Suppose that one of our AI systems (one of the ones that’s allowed to try to be creative) starts drifting out of the factoring/rewrites/reduction mod n realm into doing more pie-in-the-sky stuff: it starts inventing words like “primary decomposition” and “Noetherianity / ascending chain conditions” and produces a massive volume of text on something called “Krull dimension.” Should we punish or reward it? A similar AI goes off and produces stuff about “cohesive rings” and “fusible quasitowers” and “entwineable bundles” and such. Should we punish or reward it? Of course, in reality, they’d be taking baby steps, and we’d be punishing/rewarding far before they can start building towers. Perhaps this means we have even less information about where they’re going. (If the math police were in my office punishing/rewarding me after every line I write down, I’d certainly never get anything done, that’s for sure!)
The kinds of people who invented primary decomposition, and the ascending chain condition, and Krull dimension were “well-aligned.” They cared a lot about problems like FLT. In fact, primary decomposition (more or less) directly arose out of various failed proof attempts of FLT, and Noetherianity/ACC was (largely, initially; disclaimer that I am not a historian) a technical tool to get primary decomposition working.
It’s not so easy to run a test like “does this proof shorten the distance to things we care about” because primary decomposition, while a necessary part of the tower, was still upwards of a century away from a proof. We can’t, from Fermat’s time, look at the statement and “obviously” tell that ’it’s going to make a proof of FLT shorter. The farther you climb up the tower (someone in the corner inventing cohomology theories on schemes) the more strained the link to FLT, but all those links do appear to be ultimately necessary. If FLT is the main thing we care about, we are sort of stuck relying on incredibly vague intuitions from people like Noether and Lasker and Krull (and much later Grothendieck and Tate and Shimura and Wiles; and all the people in between) that the tower they’re building might actually be the “shortest bridge” to the elementary-to-state statements we care about (like FLT), even though (pre-Wiles) they have absolutely no way to prove that. They can in no way absolutely prove to their punish/reward trainers that “primary decomposition” and “sheaf cohomology” and “modular forms” aren’t a mis-aligned waste of time. They can argue how it’s related, but they cannot know for a fact that these things will shorten the proof.
So, how do we foster the kinds of thinking like from the Noethers and Laskers and Krulls, who were building a tower based on vague intuitions like “FLT is important and I think this is a part of understanding what’s going on,” while dis-incentivizing the time-wasters who want to invent a random theory of “cohesive rings” or “entwineable bundles” or whatever? Listed side-by-side, there’s not an obvious sticker that says “this one is important for FLT, and this one is useless.” Plausibly, a sufficiently persuasive mathematician (who is sufficiently motivated to obtain grant money) could spin a tale that could convince us that what they’re doing is the most important thing in the world for (*grabs random scrap of paper from a hat*) Fermat’s Last Theorem. If we can’t trust our ability to resist persuasion by mathematicians hungry for grant money, how do we train them to have motivations “aligned with” building towers that contribute to FLT, instead of building towers to nowhere that trip random reward centers.