So, the LLM generated five hypotheses, one of which the team also agrees with, but has not verified?
The article frames the extra hypotheses as making the results more impressive, but it seems to me that they make the results less impressive—if the LLM generates enough hypotheses, and you already know the answer, one of them is likely to sound like the answer.
As far as I understand from the article, the LLM generated five hypotheses that make sense. One of them is the one that the team has already verified but hadn’t yet published anywhere and another one the team hadn’t even thought of but consider worth investigating.
Assuming the five are a representative sample rather than a small human-curated set of many more hypotheses, I think that’s pretty impressive.
if the LLM generates enough hypotheses, and you already know the answer, one of them is likely to sound like the answer.
I don’t think this is true in general. Take any problem that is difficult to solve but easy to verify and you aren’t likely to have an LLM guess the answer.
I am skeptical of the claim that the research is unique and hasn’t been published anywhere, and I’d also really like to know the details regarding what they prompted the model with.
The whole co-scientist thing looks really weird. Look at the graph there. Am I misreading it, or people rated it just barely better than raw o1 outputs? How is that consistent with it apparently pulling all of these amazing discoveries out of the air?
Edit: Found (well, Grok 3 found) an article with some more details regarding Penadés’ work. Apparently they did publish a related finding, and did feed it into the AI co-scientist system.
Generalizing, my current take on it is that they – and probably all the other teams that are now reporting amazing results – fed the system a ton of clues regarding the answer, on top of implicitly pre-selecting the problems to be those where they already knew there’s a satisfying solution to be found.
Yeah, my general assumption in these situations is that the article is likely overstating things for a headline and reality is not so clear cut. Skepticism is definitely warranted.
So, the LLM generated five hypotheses, one of which the team also agrees with, but has not verified?
The article frames the extra hypotheses as making the results more impressive, but it seems to me that they make the results less impressive—if the LLM generates enough hypotheses, and you already know the answer, one of them is likely to sound like the answer.
As far as I understand from the article, the LLM generated five hypotheses that make sense. One of them is the one that the team has already verified but hadn’t yet published anywhere and another one the team hadn’t even thought of but consider worth investigating.
Assuming the five are a representative sample rather than a small human-curated set of many more hypotheses, I think that’s pretty impressive.
I don’t think this is true in general. Take any problem that is difficult to solve but easy to verify and you aren’t likely to have an LLM guess the answer.
I am skeptical of the claim that the research is unique and hasn’t been published anywhere, and I’d also really like to know the details regarding what they prompted the model with.
The whole co-scientist thing looks really weird. Look at the graph there. Am I misreading it, or people rated it just barely better than raw o1 outputs? How is that consistent with it apparently pulling all of these amazing discoveries out of the air?
Edit: Found (well, Grok 3 found) an article with some more details regarding Penadés’ work. Apparently they did publish a related finding, and did feed it into the AI co-scientist system.
Generalizing, my current take on it is that they – and probably all the other teams that are now reporting amazing results – fed the system a ton of clues regarding the answer, on top of implicitly pre-selecting the problems to be those where they already knew there’s a satisfying solution to be found.
Yeah, my general assumption in these situations is that the article is likely overstating things for a headline and reality is not so clear cut. Skepticism is definitely warranted.