I think the main way that proof assistant research feeds into capabilies research is not through the assistants themselves, but by the transfer of the proof assistant research to creating foundation models with better reasoning capabilities. I think researching better proof assistants can shorten timelines.
See also Demis’ Hassabis recent tweet. Admittedly, it’s unclear whether he refers to AlphaProof itself being accessible from Gemini, or the research into AlphaProof feeding into improvements of Gemini.
See also an important paragraph in the blogpost for AlphaProof: “As part of our IMO work, we also experimented with a natural language reasoning system, built upon Gemini and our latest research to enable advanced problem-solving skills. This system doesn’t require the problems to be translated into a formal language and could be combined with other AI systems. We also tested this approach on this year’s IMO problems and the results showed great promise.”
I can see that research into proof assistants might lead to better techniques for combining foundation models with RL. Is there anything more specific that you imagine? Outside of math there are very different problems because there is no easy to way to synthetically generate a lot of labeled data (as opposed to formally verifiable proofs).
While some AI techniques developed for proof assistants might be transferable to other problems, I can easily imagine a responsible actor[1] producing a net positive. Don’t disclose your techniques (except maybe very judiciously), don’t open your source, maintain information security, maybe only provide access as a service, maybe only provide access to select people/organizations.
I can see that research into proof assistants might lead to better techniques for combining foundation models with RL. Is there anything more specific that you imagine? Outside of math there are very different problems because there is no easy to way to synthetically generate a lot of labeled data (as opposed to formally verifiable proofs).
Not much more specific! I guess from a certain level of capabilities onward, one could create labels with foundation models that evaluate reasoning steps. This is much more fuzzy than math, but I still guess a person who created a groundbreaking proof assistant would be extremely valuable for any effort that tries to make foundation models reason reliably. And if they’d work at a company like google, then I think their ideas would likely diffuse even if they didn’t want to work on foundation models.
Thanks for your details on how someone could act responsibly in this space! That makes sense. I think one caveat is that proof assistant research might need enormous amounts of compute, and so it’s unclear how to work on it productively outside of a company where the ideas would likely diffuse.
I think the main way that proof assistant research feeds into capabilies research is not through the assistants themselves, but by the transfer of the proof assistant research to creating foundation models with better reasoning capabilities. I think researching better proof assistants can shorten timelines.
See also Demis’ Hassabis recent tweet. Admittedly, it’s unclear whether he refers to AlphaProof itself being accessible from Gemini, or the research into AlphaProof feeding into improvements of Gemini.
See also an important paragraph in the blogpost for AlphaProof: “As part of our IMO work, we also experimented with a natural language reasoning system, built upon Gemini and our latest research to enable advanced problem-solving skills. This system doesn’t require the problems to be translated into a formal language and could be combined with other AI systems. We also tested this approach on this year’s IMO problems and the results showed great promise.”
I can see that research into proof assistants might lead to better techniques for combining foundation models with RL. Is there anything more specific that you imagine? Outside of math there are very different problems because there is no easy to way to synthetically generate a lot of labeled data (as opposed to formally verifiable proofs).
While some AI techniques developed for proof assistants might be transferable to other problems, I can easily imagine a responsible actor[1] producing a net positive. Don’t disclose your techniques (except maybe very judiciously), don’t open your source, maintain information security, maybe only provide access as a service, maybe only provide access to select people/organizations.
To be clear, I don’t consider Alphabet to be a responsible actor.
Not much more specific! I guess from a certain level of capabilities onward, one could create labels with foundation models that evaluate reasoning steps. This is much more fuzzy than math, but I still guess a person who created a groundbreaking proof assistant would be extremely valuable for any effort that tries to make foundation models reason reliably. And if they’d work at a company like google, then I think their ideas would likely diffuse even if they didn’t want to work on foundation models.
Thanks for your details on how someone could act responsibly in this space! That makes sense. I think one caveat is that proof assistant research might need enormous amounts of compute, and so it’s unclear how to work on it productively outside of a company where the ideas would likely diffuse.
There seems to be some transfer though between math or code capabilities (for which synthetic data can often be easily created and verified) and broader agentic (LLM) capabilities, e.g. https://x.com/YangjunR/status/1793681237275820254/photo/2.