In a slightly different direction than proof assistants, I’m reminded of Terry Tao’s recent experience trying out GPT-4 to play the role of collaborator:
As I noted at this MathOverflow answer (with a concurrence by Bill Thurston), one of the most intellectually satisfying experiences as a research mathematician is interacting at the blackboard with one or more human co-authors who are exactly on the same wavelength as oneself while working collaboratively on the same problem. I do look forward to the day that I can have a similar conversation with an AI attuned to my way of thinking, or (in the more distant future) talking to an attuned AI version of a human colleague when that human colleague is not available for whatever reason. (Though in the latter case there are some non-trivial issues regarding security, privacy, intellectual property, liability, etc. that would likely need to be resolved first before such public AI avatars could be safely deployed.)
I have experimented with prompting GPT-4 to play the role of precisely such a collaborator on a test problem, with the AI instructed to suggest techniques and directions rather than to directly attempt solve the problem (which the current state-of-the-art LLMs are still quite terrible at). Thus far, the results have been only mildly promising; the AI collaborator certainly serves as an enthusiastic sounding board, and can sometimes suggest relevant references or potential things to try, though in most cases these are references and ideas that I was already aware of and could already evaluate, and were also mixed in with some less relevant citations and strategies. But I could see this style of prompting being useful for a more junior researcher, or someone such as myself exploring an area further from my own area of expertise. And there have been a few times now where this tool has suggested to me a concept that was relevant to the problem in a non-obvious fashion, even if it was not able to coherently state why it was in fact relevant. So while it certainly isn’t at the level of a genuinely competent collaborator yet, it does have potential to evolve into one as the technology improves (and is integrated with further tools, as I describe in my article).
I could feed GPT-4 the first few PDF pages of a recent math preprint and get it to generate a half-dozen intelligent questions that an expert attending a talk on the preprint could ask. I plan to use variants of such prompts to prepare my future presentations or to begin reading a technically complex paper. Initially, I labored to make the prompts as precise as possible, based on experience with programming or scripting languages. Eventually the best results came when I unlearned that caution and simply threw lots of raw text at the AI. …
I now routinely use GPT-4 to answer casual and vaguely phrased questions that I would previously have attempted with a carefully prepared search-engine query. I have asked it to suggest first drafts of complex documents I had to write.
Which isn’t to say that his experience has been all positive; the usual hallucination issues still crop up:
Current large language models (LLM) can often persuasively mimic correct expert response in a given knowledge domain (such as my own, research mathematics). But as is infamously known, the response often consists of nonsense when inspected closely. Both humans and AI need to develop skills to analyze this new type of text. The stylistic signals that I traditionally rely on to “smell out” a hopelessly incorrect math argument are of little use with LLM-generated mathematics. Only line-by-line reading can discern if there is any substance. Strangely, even nonsensical LLM-generated math often references relevant concepts. With effort, human experts can modify ideas that do not work as presented into a correct and original argument.
One related direction where some progress is likely to be made in the near future is in using LLMs to semi-automate some aspects of formalizing a mathematical proof in a formal language such as Lean; see this recent talk by Jason Rute for a survey of the current state of the art. There are already some isolated examples in which a research paper is submitted in conjunction with a formally verified version of the proofs, and these new tools may make this practice more common. One could imagine journals offering an expedited refereeing process for such certified submissions in the near future, as the referee is freed to focus on other aspects of the paper such as exposition and impact.
In a slightly different direction than proof assistants, I’m reminded of Terry Tao’s recent experience trying out GPT-4 to play the role of collaborator:
Terry sounded more enthusiastic here:
Which isn’t to say that his experience has been all positive; the usual hallucination issues still crop up:
And going back to proof assistants: