Alignment Math people: I would appreciate it if someone could review this video of Terrence Tao giving a presentation on machine-assisted proofs to give feedback on what they think an ideal alignment assistant could do in this domain.
In addition, I’m thinking of eventually looking at models like DeepSeek-Prover to see if they can be beneficial for assisting alignment researchers in creating proofs:
Proof assistants like Lean have revolutionized mathematical proof verification, ensuring high accuracy and reliability. Although large language models (LLMs) show promise in mathematical reasoning, their advancement in formal theorem proving is hindered by a lack of training data. To address this issue, we introduce an approach to generate extensive Lean 4 proof data derived from high-school and undergraduate-level mathematical competition problems. This approach involves translating natural language problems into formal statements, filtering out low-quality statements, and generating proofs to create synthetic data. After fine-tuning the DeepSeekMath 7B model on this synthetic dataset, which comprises 8 million formal statements with proofs, our model achieved whole-proof generation accuracies of 46.3% with 64 samples and 52% cumulatively on the Lean 4 miniF2F test, surpassing the baseline GPT-4 at 23.0% with 64 samples and a tree search reinforcement learning method at 41.0%. Additionally, our model successfully proved 5 out of 148 problems in the Lean 4 Formalized International Mathematical Olympiad (FIMO) benchmark, while GPT-4 failed to prove any. These results demonstrate the potential of leveraging large-scale synthetic data to enhance theorem-proving capabilities in LLMs. Both the synthetic dataset and the model will be made available to facilitate further research in this promising field.
I’ve experimented with Claude Opus for simple Ada autoformalization test cases (specifically quicksort), and it seems like the sort of issues that make LLM agents infeasible (hallucination-based drift, subtle drift caused by sticking to certain implicit assumptions you made before) are also the issues that make Opus hard to use for autoformalization attempts.
I haven’t experimented with a scaffolded LLM agent for autoformalization, but I expect it won’t go very well either, primarily because scaffolding involves attempts to make human-like implicit high-level cognitive strategies into explicit algorithms or heuristics such as tree of thought prompting, and I expect that this doesn’t scale given the complexity of the domain (sufficently general autoformalizing AI systems can be modelled as effectively consequentialist, which makes them dangerous). I don’t expect for a scaffolded (over Opus) LLM agent to succeed at autoformalizing quicksort right now either, mostly because I believe RLHF tuning has systematically optimized Opus to write the bottom line first and then attempt to build or hallucinate a viable answer, and then post-hoc justify the answer. (While steganographic non-visible chain-of-thought may have gone into figuring out the bottom line, it still is worse than first doing visible chain-of-thought such that it has more token-compute-iterations to compute its answer.)
If anyone reading this is able to build a scaffolded agent that autoformalizes (using Lean or Ada) algorithms of complexity equivalent to quicksort reliably (such that more than 5 out of 10 of its attempts succeed) within the next month of me writing this comment, then I’d like to pay you 1000 EUR to see your code and for an hour of your time to talk with you about this. That’s a little less than twice my current usual monthly expenses, for context.
Great. Yeah, I also expect that it is hard to get current models to work well on this. However, I will mention that the DeepSeekMath model does seem to outperform GPT-4 despite having only 7B parameters. So, it may be possible to create a +70B fine-tune that basically destroys GPT-4 at math. The issue is whether it generalizes to the kind of math we’d commonly see in alignment research.
Additionally, I expect at least a bit can be done with scaffolding, search, etc. I think the issue with many prompting methods atm is that they are specifically trying to get the model to arrive at solutions on their own. And what I mean by that is that they are starting from the frame of “how can we get LLMs to solve x math task on their own,” instead of “how do we augment the researcher’s ability to arrive at (better) proofs more efficiently using LLMs.” So, I think there’s room for product building that does not involve “can you solve this math question from scratch,” though I see the value in getting that to work as well.
Alignment Math people: I would appreciate it if someone could review this video of Terrence Tao giving a presentation on machine-assisted proofs to give feedback on what they think an ideal alignment assistant could do in this domain.
In addition, I’m thinking of eventually looking at models like DeepSeek-Prover to see if they can be beneficial for assisting alignment researchers in creating proofs:
I’ve experimented with Claude Opus for simple Ada autoformalization test cases (specifically quicksort), and it seems like the sort of issues that make LLM agents infeasible (hallucination-based drift, subtle drift caused by sticking to certain implicit assumptions you made before) are also the issues that make Opus hard to use for autoformalization attempts.
I haven’t experimented with a scaffolded LLM agent for autoformalization, but I expect it won’t go very well either, primarily because scaffolding involves attempts to make human-like implicit high-level cognitive strategies into explicit algorithms or heuristics such as tree of thought prompting, and I expect that this doesn’t scale given the complexity of the domain (sufficently general autoformalizing AI systems can be modelled as effectively consequentialist, which makes them dangerous). I don’t expect for a scaffolded (over Opus) LLM agent to succeed at autoformalizing quicksort right now either, mostly because I believe RLHF tuning has systematically optimized Opus to write the bottom line first and then attempt to build or hallucinate a viable answer, and then post-hoc justify the answer. (While steganographic non-visible chain-of-thought may have gone into figuring out the bottom line, it still is worse than first doing visible chain-of-thought such that it has more token-compute-iterations to compute its answer.)
If anyone reading this is able to build a scaffolded agent that autoformalizes (using Lean or Ada) algorithms of complexity equivalent to quicksort reliably (such that more than 5 out of 10 of its attempts succeed) within the next month of me writing this comment, then I’d like to pay you 1000 EUR to see your code and for an hour of your time to talk with you about this. That’s a little less than twice my current usual monthly expenses, for context.
Great. Yeah, I also expect that it is hard to get current models to work well on this. However, I will mention that the DeepSeekMath model does seem to outperform GPT-4 despite having only 7B parameters. So, it may be possible to create a +70B fine-tune that basically destroys GPT-4 at math. The issue is whether it generalizes to the kind of math we’d commonly see in alignment research.
Additionally, I expect at least a bit can be done with scaffolding, search, etc. I think the issue with many prompting methods atm is that they are specifically trying to get the model to arrive at solutions on their own. And what I mean by that is that they are starting from the frame of “how can we get LLMs to solve x math task on their own,” instead of “how do we augment the researcher’s ability to arrive at (better) proofs more efficiently using LLMs.” So, I think there’s room for product building that does not involve “can you solve this math question from scratch,” though I see the value in getting that to work as well.