the use of ChatGPT is a bad idea. ChatGPT greatly underperforms what we know GPT and similar models are capable of in the inner-monologue papers on many math-related questions (particularly after instruction tuning), almost all of them far harder than the CRT. You are wasting your time doing math with ChatGPT unless you are interested in specifically ChatGPT, such as looking into its mode collapse. Which you aren’t. (Remember: “sampling can prove the presence of knowledge, but not the absence”—especially after RL or finetuning intended to change drastically what samples come out!)
You would be better off tinkering with davinci-003 or possibly a codex model (and making sure to try out best-of=20 to avoid being fooled by stochasticity of sampling and see what the ‘best’ results really are).
these systems are already combinable. GPT-f (and later Lean work) benefits from its pretraining on natural language math corpuses, which provides a powerful prior for its ATP. The inner-monologue papers show benefits from a LM thinking about a problem and writing Python code to solve the problem for it (exploiting increasingly-amazing code-writing capabilities). And there’s the autoformalization with LMs which I haven’t yet caught up to in my reading but looks pretty exciting, and Szegedy is excited too.
the ‘math alignment’ problem is about as hard as the ‘image generator alignment problem’, which is to say, little at all like ‘the AI alignment problem’.
The great thing about it is, if you don’t like a result from GPT-f or Minerva or the auto-formalizing models, you just throw it away & try again, no harm done beyond some wasted electricity. (That’s not the case for autonomous agents.) As the joke goes, mathematicians are great—all they need is a chalkboard and a waste basket. So if they build up castles in the sky, that’s not a problem as long as it still proves useful for end applications: does access to those theorems shorten proofs of theorems we care about? Do models trained with that produce better proofs? etc. You can feed them back in and keep going. (This is a common trick now with image-gen: as long as you can generate some images that you want, you can bootstrap by training a DreamBooth or Textual-Inversion or DreamArtist or the finetune du jour on those, generate, select, re-finetune etc.) You can simply keep adding rewards for human-directed uses and penalties for irrelevant tangents and adjusting and doing end-to-end training, RL training if necessary. What they ‘want’ will be defined by the rewards/penalties.
So I don’t see much of a ‘math alignment’ problem beyond a relatively easy engineering problem—the real challenge remains getting systems whose ‘math alignment’ you’d care about at all. (Any threat comes from integrating them into self-improving systems such that they might be doing mathematical research into how to make themselves more powerful, but that requires some degree of agency or RL setting, not just a model in a GPU tracing out a tree.)
from a RL perspective, it’s not absurd at all to suggest that exploration and novelty drives can be meta-learned from a few positive outcomes that yield ground-truth rewards. It’s meta-learning all the way down/up and reward is enough. I can totally believe that math models can do the same thing, learning preferences/esthetics towards work that will one day be useful, based on techniques like PBT selection for models which learn drives towards useful work or training on past human work to predict utility etc.
Thanks for the pointer to davinci-003! I am certainly not interested in ChatGPT specifically, it just happens to be the case that ChatGPT is the easiest to pop open up and start using for a non-expert (like myself). It was fun enough to tinker with, so I look forward to checking out davinci.
I had not heard of GPT-f—appreciate the link to the paper! I’ve seen some lean demonstrations, and they were pretty cool. It did well with some very elementary topology problems (reasoning around the definition of “continuous”), and struggled with analysis in interesting ways. There was some particular theorem (maybe the extreme value theorem? I could be forgetting) that it was able to get in dimension 1 without too much trouble, but that it struggled hard with in dimension 2, in a way that a human would not really struggle with (the proof of this particular in dimension 2, or dimension n for that matter, is very nearly identical at first reading). Breaking down it’s failure, the demonstrators argued pretty convincingly that perhaps there’s actually just honestly more going on in the dimension 2 case than the dimension 1 case that a human prover might be glossing over. The machine can’t say “this set is compact, so the extension of the previous argument is completely analogous/obvious/trivial,” it has to actually go through the details of proving compactness in detail, carrying out the extension, etc.. The details may not be deep (in this particular case), but they are different. I think it would be really cool to see lean-like systems that are capable of making possibly illegitimate logical leaps (canonically: “this is trivial,” or “we leave this as an exercise”) to escape some nasty, tangential sub-sub-argument they’re stuck on, even if they diverge from that point and ultimately get the conclusion wrong (as you say in bullet 3: just run 500 more randomized instances, and see where they go, or inspect the results manually), just to see where they might go next. Regarding code-writing capabilities, it doesn’t seem like there’s anything in principle special about python, and it would be cool to start seeing variants pop up with access to larger mathematics languages (think sage, matlab, mathematica, magma, etc.) that might be more directly useful to automatic proof-writing.
I partially agree and partially disagree on point 3.
Partial agreement: For an application like “I am trying to prove [Theorem X] that I have good reason to believe is completely provable with nothing more than the toolbox already in this paper ([Lemma 1], [Lemma 2], [Lemma 3], …, [Lemma n]) plus standard results in the literature, then I agree, reward it for directions that seem productive and seem likely to shorten the proof, and penalize it for irrelevant tangents. Build a corpus of training theorems that are similar to the things we care about, where we know of the existence of a proof using tools that we know the system is aware of (e.g., don’t train a system on only analysis texts and give a problem that you know to require Galois theory). Some models will produce proofs nearly as short or shorter than the proofs we know about, others will go off on tangents and build castles to nowhere, never really getting around to the point. Clearly we prefer the former. This is likely to be the main near-term application of these systems.
Partial disagreement: For a longer term project, where the system is allowed to make more open ended moves than “help me finish the proof of [Theorem 3.2.6],” I don’t quite agree that it is entirely obvious how to “align” the system to do things that are productive. I expanded a bit on this in another comment in this thread, but imagine a scenario where you have a long (which may mean indefinitely long: maybe 5 years, maybe 20, maybe 200) term project like “prove Fermat’s last theorem” (in a counterfactual where we don’t already know it) or “prove the abc conjecture” or “prove the Collatz conjecture” or something. Some easy to state theorems that are easy for any system competent in basic algebra and number theory manipulations to get started on, easy to generate and test examples, clearly interesting to all parties involves, and of very high interest. We cannot evaluate a system based on whether it produces shorter proofs of FLT in a hypothetical scenario where we don’t already know what the proof of FLT will involve. In the short term, the relative value of “productive-seeming” work (mass volumes of direct manipulations on the equation xn+yn=zn using standard methods that existed in Fermat’s time) vs “irrelevant tangents / castles in the sky” (inventing a theory of primary decompositions, modular forms, schemes; 95% of which is clearly irrelevant to FLT, but access to the broad body of theory appears necessary) is not entirely straightforward, since there is no easy way to test an irrelevant tangent on whether it makes the proof shorter or not. It’s very hard to evaluate whether a research direction is “promising,” and an unfortunate truth is that most of our evaluation comes down to trusting in the intuitions of the researcher that “this is important and relevant to the heart and soul of FLT.”
I think it’s a problem that grant application evaluators face. I want to fund the research team that will most strongly contribute to long term progress on FLT (or pick a problem that seems likely to have a 30 year horizon, rather than a 200 year horizon), and both teams are motivated to convince me they deserve the funding. Even for humans, this is not an easy task. Generally speaking, given a sufficiently technical mathematical paper (or cluster of papers in the authorship graph), there may exist no more than 20 people on Earth that really understand the subject enough to honestly evaluate how relevant that paper is to the grant subject: all of those people are friends of the author (and may be motivated to advance the author’s career by helping them secure a big grant, irrespective of the abstract relevance of the research; one author in their network’s success contributes to their own success, via collaborations, citations, etc.), and none of whom are evaluating the grant applications. That is, someone is building a tower / castle in the sky, and the only people who really understand the castle are personal friends of the builder. We are placing a lot of trust in the builder (and their authorship network) when we choose to fund their project, like it or not. So, here the research teams are two (or a thousand) competing parameterizations of some model in a training environment, I may strongly suspect that massive towers will have to be built, but I don’t want to trust my own intuitions about which towers are useful or not. I want to at least leave open the possibility that we (like Fermat’s contemporaries) have just no idea what needs to be done to make progress, and that we should allow “well-aligned” crazy ideas to flourish. I have to (at least partially) trust that the tower-builders genuinely “believe” on some internal level that they’re building in the right direction.
I of course agree entirely that systems that are advanced enough for the above to actually be a problem worth intensely caring about are quite a ways off, but they will come to exist sooner or later, and I don’t think it’s obvious how to get those systems building in productive directions.
Yeah, that’s the problem with ChatGPT: it’s so easy to use, and so good within a niche, we’re right back to 2020 where everyone is trying the first thing that comes to mind and declaring that GPT is busted if it doesn’t (won’t) do it. Heck, ChatGPT doesn’t even let you set the temperature! (This is on top of the hidden prompt, RLHF, unknown history mechanism, and what seems to be at least 1 additional layer of filter, like the string-matching that the DALL-E 2 service uses to censor and decide when to apply ‘diversity prompt’.) ‘Deep learning is hitting a wall’ etc...
Just remember anytime anyone uses ChatGPT to declare that “DL can’t X”: “sampling can show the presence of knowledge but not the absence.”
Python is special in that there’s a ton of it as data, and so it’s probably the single best language in any code-generating model. Using the special-purpose languages will cost you, making it a classic bias-variance tradeoff. That data richness won’t go away until you have full bootstrapping systems—not the fairly limited one-step self-distillation or exploration that we have now (eg they can generate Python puzzles to train themselves but the bootstrap seems to only work once or twice).
It’s not easy, but I think we have lots of ideas and I already covered them in my final bullet point.
There’s a rich vein of ideas in reinforcement learning & decision theory & evolutionary computation about how to do novelty search or meta-learn exploration/intrinsic-curiosity drives which balance explore-exploit. If you train models based on reward and do things like evolutionary selection, you can develop models which have ‘taste’ in terms of making short-term actions which obtain information or useful states for eventually maximizing final reward across a very wide set of environments. There are a lot of links about that in my links there, like Clune’s manifesto.
We have plenty of ideas about how you would do this; for example, here’s an evolution-strategies approach that would meta-learn exploration: ‘For each GPU, initialize a random set of axioms in a formal system(s), specify a random distant target theorem(s); apply random mutations to a large language model pretrained extensively on math corpuses to do tree search (or whatever ATP approach you prefer); do that tree search for a fixed time budget; keep the X% which successfully find their target theorem; repeat until toasted golden-brown and delicious’. This would gradually select for models which embody optimal time-limited search in a very large space of possible formal systems.
To show that it’s not merely theoretical Schimdhuber-esque musing, consider the recent very impressive success in meta-reinforcement-learning I don’t think has been discussed on LW yet and which many people probably still believe impossible in principle*: “VeLO: Training Versatile Learned Optimizers by Scaling Up”, Metz et al 2022 (Twitter)
While deep learning models have replaced hand-designed features across many domains, these models are still trained with hand-designed optimizers. In this work, we leverage the same scaling approach behind the success of deep learning to learn versatile optimizers. We train an optimizer for deep learning which is itself a small neural network that ingests gradients and outputs parameter updates.
Meta-trained with approximately four thousand TPU-months of compute on a wide variety of optimization tasks, our optimizer not only exhibits compelling performance, but optimizes in interesting and unexpected ways. It requires no hyperparameter tuning, instead automatically adapting to the specifics of the problem being optimized.
We open source our learned optimizer, meta-training code, the associated train and test data, and an extensive optimizer benchmark suite with baselines at this URL.
Here is a real-world 🚲 example (not in the paper) for T5 Small (~60M params). The VeLO-trained model reaches the same loss as >5x steps of Adafactor and takes only ~1.5x as long per step in wall time 🕓...a >3x speedup⏫!! Pretty amazing for such an OOD task!
(I’ll note in passing that “Here is an amazing DL PoC that neither you nor anyone else has heard of yet, but already does the thing you are speculating that AI might some day do in 2040 after another dozen paradigm shifts” describes a lot of DL research in 2022; there’s so much going on right now, and the people who should be pulling 16-hour days reading Arxiv are often distracted by things like FTX or Elon Musk, or just the image generation stuff, so there are many things falling through the cracks. But 5 years from now, you won’t care how much time you spent following Twitter drama, and you will care about missing out on things like Dramatron or CICERO or U-PaLM or RT-1.)
* Given some of the comments on my Clippy story, anyway...
Some observations:
the use of ChatGPT is a bad idea. ChatGPT greatly underperforms what we know GPT and similar models are capable of in the inner-monologue papers on many math-related questions (particularly after instruction tuning), almost all of them far harder than the CRT. You are wasting your time doing math with ChatGPT unless you are interested in specifically ChatGPT, such as looking into its mode collapse. Which you aren’t. (Remember: “sampling can prove the presence of knowledge, but not the absence”—especially after RL or finetuning intended to change drastically what samples come out!)
You would be better off tinkering with davinci-003 or possibly a codex model (and making sure to try out best-of=20 to avoid being fooled by stochasticity of sampling and see what the ‘best’ results really are).
these systems are already combinable. GPT-f (and later Lean work) benefits from its pretraining on natural language math corpuses, which provides a powerful prior for its ATP. The inner-monologue papers show benefits from a LM thinking about a problem and writing Python code to solve the problem for it (exploiting increasingly-amazing code-writing capabilities). And there’s the autoformalization with LMs which I haven’t yet caught up to in my reading but looks pretty exciting, and Szegedy is excited too.
the ‘math alignment’ problem is about as hard as the ‘image generator alignment problem’, which is to say, little at all like ‘the AI alignment problem’.
The great thing about it is, if you don’t like a result from GPT-f or Minerva or the auto-formalizing models, you just throw it away & try again, no harm done beyond some wasted electricity. (That’s not the case for autonomous agents.) As the joke goes, mathematicians are great—all they need is a chalkboard and a waste basket. So if they build up castles in the sky, that’s not a problem as long as it still proves useful for end applications: does access to those theorems shorten proofs of theorems we care about? Do models trained with that produce better proofs? etc. You can feed them back in and keep going. (This is a common trick now with image-gen: as long as you can generate some images that you want, you can bootstrap by training a DreamBooth or Textual-Inversion or DreamArtist or the finetune du jour on those, generate, select, re-finetune etc.) You can simply keep adding rewards for human-directed uses and penalties for irrelevant tangents and adjusting and doing end-to-end training, RL training if necessary. What they ‘want’ will be defined by the rewards/penalties.
So I don’t see much of a ‘math alignment’ problem beyond a relatively easy engineering problem—the real challenge remains getting systems whose ‘math alignment’ you’d care about at all. (Any threat comes from integrating them into self-improving systems such that they might be doing mathematical research into how to make themselves more powerful, but that requires some degree of agency or RL setting, not just a model in a GPU tracing out a tree.)
from a RL perspective, it’s not absurd at all to suggest that exploration and novelty drives can be meta-learned from a few positive outcomes that yield ground-truth rewards. It’s meta-learning all the way down/up and reward is enough. I can totally believe that math models can do the same thing, learning preferences/esthetics towards work that will one day be useful, based on techniques like PBT selection for models which learn drives towards useful work or training on past human work to predict utility etc.
Thanks for the pointer to davinci-003! I am certainly not interested in ChatGPT specifically, it just happens to be the case that ChatGPT is the easiest to pop open up and start using for a non-expert (like myself). It was fun enough to tinker with, so I look forward to checking out davinci.
I had not heard of GPT-f—appreciate the link to the paper! I’ve seen some lean demonstrations, and they were pretty cool. It did well with some very elementary topology problems (reasoning around the definition of “continuous”), and struggled with analysis in interesting ways. There was some particular theorem (maybe the extreme value theorem? I could be forgetting) that it was able to get in dimension 1 without too much trouble, but that it struggled hard with in dimension 2, in a way that a human would not really struggle with (the proof of this particular in dimension 2, or dimension n for that matter, is very nearly identical at first reading). Breaking down it’s failure, the demonstrators argued pretty convincingly that perhaps there’s actually just honestly more going on in the dimension 2 case than the dimension 1 case that a human prover might be glossing over. The machine can’t say “this set is compact, so the extension of the previous argument is completely analogous/obvious/trivial,” it has to actually go through the details of proving compactness in detail, carrying out the extension, etc.. The details may not be deep (in this particular case), but they are different. I think it would be really cool to see lean-like systems that are capable of making possibly illegitimate logical leaps (canonically: “this is trivial,” or “we leave this as an exercise”) to escape some nasty, tangential sub-sub-argument they’re stuck on, even if they diverge from that point and ultimately get the conclusion wrong (as you say in bullet 3: just run 500 more randomized instances, and see where they go, or inspect the results manually), just to see where they might go next. Regarding code-writing capabilities, it doesn’t seem like there’s anything in principle special about python, and it would be cool to start seeing variants pop up with access to larger mathematics languages (think sage, matlab, mathematica, magma, etc.) that might be more directly useful to automatic proof-writing.
I partially agree and partially disagree on point 3.
Partial agreement: For an application like “I am trying to prove [Theorem X] that I have good reason to believe is completely provable with nothing more than the toolbox already in this paper ([Lemma 1], [Lemma 2], [Lemma 3], …, [Lemma n]) plus standard results in the literature, then I agree, reward it for directions that seem productive and seem likely to shorten the proof, and penalize it for irrelevant tangents. Build a corpus of training theorems that are similar to the things we care about, where we know of the existence of a proof using tools that we know the system is aware of (e.g., don’t train a system on only analysis texts and give a problem that you know to require Galois theory). Some models will produce proofs nearly as short or shorter than the proofs we know about, others will go off on tangents and build castles to nowhere, never really getting around to the point. Clearly we prefer the former. This is likely to be the main near-term application of these systems.
Partial disagreement: For a longer term project, where the system is allowed to make more open ended moves than “help me finish the proof of [Theorem 3.2.6],” I don’t quite agree that it is entirely obvious how to “align” the system to do things that are productive. I expanded a bit on this in another comment in this thread, but imagine a scenario where you have a long (which may mean indefinitely long: maybe 5 years, maybe 20, maybe 200) term project like “prove Fermat’s last theorem” (in a counterfactual where we don’t already know it) or “prove the abc conjecture” or “prove the Collatz conjecture” or something. Some easy to state theorems that are easy for any system competent in basic algebra and number theory manipulations to get started on, easy to generate and test examples, clearly interesting to all parties involves, and of very high interest. We cannot evaluate a system based on whether it produces shorter proofs of FLT in a hypothetical scenario where we don’t already know what the proof of FLT will involve. In the short term, the relative value of “productive-seeming” work (mass volumes of direct manipulations on the equation xn+yn=zn using standard methods that existed in Fermat’s time) vs “irrelevant tangents / castles in the sky” (inventing a theory of primary decompositions, modular forms, schemes; 95% of which is clearly irrelevant to FLT, but access to the broad body of theory appears necessary) is not entirely straightforward, since there is no easy way to test an irrelevant tangent on whether it makes the proof shorter or not. It’s very hard to evaluate whether a research direction is “promising,” and an unfortunate truth is that most of our evaluation comes down to trusting in the intuitions of the researcher that “this is important and relevant to the heart and soul of FLT.”
I think it’s a problem that grant application evaluators face. I want to fund the research team that will most strongly contribute to long term progress on FLT (or pick a problem that seems likely to have a 30 year horizon, rather than a 200 year horizon), and both teams are motivated to convince me they deserve the funding. Even for humans, this is not an easy task. Generally speaking, given a sufficiently technical mathematical paper (or cluster of papers in the authorship graph), there may exist no more than 20 people on Earth that really understand the subject enough to honestly evaluate how relevant that paper is to the grant subject: all of those people are friends of the author (and may be motivated to advance the author’s career by helping them secure a big grant, irrespective of the abstract relevance of the research; one author in their network’s success contributes to their own success, via collaborations, citations, etc.), and none of whom are evaluating the grant applications. That is, someone is building a tower / castle in the sky, and the only people who really understand the castle are personal friends of the builder. We are placing a lot of trust in the builder (and their authorship network) when we choose to fund their project, like it or not. So, here the research teams are two (or a thousand) competing parameterizations of some model in a training environment, I may strongly suspect that massive towers will have to be built, but I don’t want to trust my own intuitions about which towers are useful or not. I want to at least leave open the possibility that we (like Fermat’s contemporaries) have just no idea what needs to be done to make progress, and that we should allow “well-aligned” crazy ideas to flourish. I have to (at least partially) trust that the tower-builders genuinely “believe” on some internal level that they’re building in the right direction.
I of course agree entirely that systems that are advanced enough for the above to actually be a problem worth intensely caring about are quite a ways off, but they will come to exist sooner or later, and I don’t think it’s obvious how to get those systems building in productive directions.
Yeah, that’s the problem with ChatGPT: it’s so easy to use, and so good within a niche, we’re right back to 2020 where everyone is trying the first thing that comes to mind and declaring that GPT is busted if it doesn’t (won’t) do it. Heck, ChatGPT doesn’t even let you set the temperature! (This is on top of the hidden prompt, RLHF, unknown history mechanism, and what seems to be at least 1 additional layer of filter, like the string-matching that the DALL-E 2 service uses to censor and decide when to apply ‘diversity prompt’.) ‘Deep learning is hitting a wall’ etc...
Just remember anytime anyone uses ChatGPT to declare that “DL can’t X”: “sampling can show the presence of knowledge but not the absence.”
Python is special in that there’s a ton of it as data, and so it’s probably the single best language in any code-generating model. Using the special-purpose languages will cost you, making it a classic bias-variance tradeoff. That data richness won’t go away until you have full bootstrapping systems—not the fairly limited one-step self-distillation or exploration that we have now (eg they can generate Python puzzles to train themselves but the bootstrap seems to only work once or twice).
It’s not easy, but I think we have lots of ideas and I already covered them in my final bullet point.
There’s a rich vein of ideas in reinforcement learning & decision theory & evolutionary computation about how to do novelty search or meta-learn exploration/intrinsic-curiosity drives which balance explore-exploit. If you train models based on reward and do things like evolutionary selection, you can develop models which have ‘taste’ in terms of making short-term actions which obtain information or useful states for eventually maximizing final reward across a very wide set of environments. There are a lot of links about that in my links there, like Clune’s manifesto.
We have plenty of ideas about how you would do this; for example, here’s an evolution-strategies approach that would meta-learn exploration: ‘For each GPU, initialize a random set of axioms in a formal system(s), specify a random distant target theorem(s); apply random mutations to a large language model pretrained extensively on math corpuses to do tree search (or whatever ATP approach you prefer); do that tree search for a fixed time budget; keep the X% which successfully find their target theorem; repeat until toasted golden-brown and delicious’. This would gradually select for models which embody optimal time-limited search in a very large space of possible formal systems.
To show that it’s not merely theoretical Schimdhuber-esque musing, consider the recent very impressive success in meta-reinforcement-learning I don’t think has been discussed on LW yet and which many people probably still believe impossible in principle*: “VeLO: Training Versatile Learned Optimizers by Scaling Up”, Metz et al 2022 (Twitter)
https://twitter.com/ada_rob/status/1593702507422912516
(I’ll note in passing that “Here is an amazing DL PoC that neither you nor anyone else has heard of yet, but already does the thing you are speculating that AI might some day do in 2040 after another dozen paradigm shifts” describes a lot of DL research in 2022; there’s so much going on right now, and the people who should be pulling 16-hour days reading Arxiv are often distracted by things like FTX or Elon Musk, or just the image generation stuff, so there are many things falling through the cracks. But 5 years from now, you won’t care how much time you spent following Twitter drama, and you will care about missing out on things like Dramatron or CICERO or U-PaLM or RT-1.)
* Given some of the comments on my Clippy story, anyway...
That paper is insane…you’re finding this stuff just by trawling through Arxiv, or through some other method?
Twitter + Reddit + Google Scholar Alerts on key researchers. Honestly, it’s too much, I am many months behind in my reading.