If this kind of approach to mathematics research becomes mainstream, out-competing humans working alone, that would be pretty convincing. So there is nothing that disqualifies this example—it does update me slightly.
However, this example on its own seems unconvincing for a couple of reasons:
it seems that the results were in fact proven by humans first, calling into question the claim that the proof insight belonged to the LLM (even though the authors try to frame it that way).
from the reply on X it seems that the results of the paper may not have been novel? In that case, it’s hard to view this as evidence for LLMs accelerating mathematical research.
I have personally attempted to interactively use LLMs in my research process, though NOT with anything like this degree of persistence. My impression is that it becomes very easy to feel that the LLM is “almost useful” but after endless attempts it never actually becomes useful for mathematical research (it can be useful for other things like rapidly prototyping or debugging code). My suspicion is that this feeling of “almost usefulness” is an illusion; here’s a related comment from my shortform: https://www.lesswrong.com/posts/RnKmRusmFpw7MhPYw/cole-wyeth-s-shortform?commentId=Fx49CwbrH7ucmsYhD
Does this paper look more like mathematicians experimented with an LLM to try and get useful intellectual labor out of it, resulting in some curiosities but NOT accelerating their work, or does it look like they adopted the LLM for practical reasons? If it’s the former, it seems to fall under the category of proving a conjecture that was constructed for that purpose (to be proven by an LLM).
If this kind of approach to mathematics research becomes mainstream, out-competing humans working alone, that would be pretty convincing. So there is nothing that disqualifies this example—it does update me slightly.
However, this example on its own seems unconvincing for a couple of reasons:
it seems that the results were in fact proven by humans first, calling into question the claim that the proof insight belonged to the LLM (even though the authors try to frame it that way).
from the reply on X it seems that the results of the paper may not have been novel? In that case, it’s hard to view this as evidence for LLMs accelerating mathematical research.
I have personally attempted to interactively use LLMs in my research process, though NOT with anything like this degree of persistence. My impression is that it becomes very easy to feel that the LLM is “almost useful” but after endless attempts it never actually becomes useful for mathematical research (it can be useful for other things like rapidly prototyping or debugging code). My suspicion is that this feeling of “almost usefulness” is an illusion; here’s a related comment from my shortform: https://www.lesswrong.com/posts/RnKmRusmFpw7MhPYw/cole-wyeth-s-shortform?commentId=Fx49CwbrH7ucmsYhD
Does this paper look more like mathematicians experimented with an LLM to try and get useful intellectual labor out of it, resulting in some curiosities but NOT accelerating their work, or does it look like they adopted the LLM for practical reasons? If it’s the former, it seems to fall under the category of proving a conjecture that was constructed for that purpose (to be proven by an LLM).