For our choice of workshop, we believe the ICBINB workshop is a highly relevant choice for the purpose of our experiment. As we wrote in the main text, we selected this workshop because of its broader scope, challenging researchers (and our AI Scientist) to tackle diverse research topics that address practical limitations of deep learning, unlike most workshops with a narrow focus on one topic.
This workshop focuses particularly on understanding limitations of deep learning methods applied to real world problems, and encourages participants to study negative experimental outcomes. Some may criticize our choice of a workshop that encourages discussion of “negative results” (implying that papers discussing negative results are failed scientific discoveries), but we disagree, and we believe this is an important topic.
and while it is true that “negative results” are important to report, “we report a negative result because our AI agent put forward a reasonable and interesting hypothesis, competently tested the hypothesis, and found that the hypothesis was false” looks a lot like “our AI agent put forward a reasonable and interesting hypothesis, flailed around trying to implement it, had major implementation problems, and wrote a plausible-sounding paper describing its failure as a fact about the world rather than a fact about its skill level”.
The paper has a few places with giant red flags where it seems that the reviewer assumes that there were solid results that the author of the paper was simply not reporting skillfully, for example in section B2
,
I favor an alternative hypothesis: the Sakana agent determines where a graph belongs, what would be on the X and Y axis of that graph, what it expects that the graph would look like, and how to generate that graph. It then generates the graph and inserts the caption the graph would show if its hypothesis was correct. The agent has no particular ability to notice that its description doesn’t work with the graph.
I’m skeptical.
Did the Sakana team publish the code that their scientist agent used to write the compositional regularization paper? The post says
and while it is true that “negative results” are important to report, “we report a negative result because our AI agent put forward a reasonable and interesting hypothesis, competently tested the hypothesis, and found that the hypothesis was false” looks a lot like “our AI agent put forward a reasonable and interesting hypothesis, flailed around trying to implement it, had major implementation problems, and wrote a plausible-sounding paper describing its failure as a fact about the world rather than a fact about its skill level”.
The paper has a few places with giant red flags where it seems that the reviewer assumes that there were solid results that the author of the paper was simply not reporting skillfully, for example in section B2
I favor an alternative hypothesis: the Sakana agent determines where a graph belongs, what would be on the X and Y axis of that graph, what it expects that the graph would look like, and how to generate that graph. It then generates the graph and inserts the caption the graph would show if its hypothesis was correct. The agent has no particular ability to notice that its description doesn’t work with the graph.