I disagree, in fact I actually think you can argue this development points the opposite direction, when you look at what they had to do to achieve it and the architecture they use.
I suggest you read Ernest Davis’ overview of Cicero. Cicero is a special-purpose system that took enormous work to produce—a team of multiple people labored on it for three years. They had to assemble a massive dataset from 125,300 online human games. They also had to get expert annotations on thousands of preliminary outputs. Even that was not enough.. they had to generate synthetic datasets as well to fix issues with the system! Even then, the dialogue module required a specialized filter to remove nonsense. This is a break from the scaling idea that says to solve new problems you just need to scale existing architectures to more parameters (and train on a large enough dataset).
Additionally, they argue that this system appears very unlikely to generalize to other problems, or even to slight modifications of the game of Diplomacy. It’s not even clear how well it would generalize to non-blitz games. If the rules were modified slightly, the entire system would likely have to be retrained.
I also want to point out that scientific research is not easy as you make it sound. Professors spend the bulk of their time writing proposals, so perhaps AI could help there by summarizing existing literature. Note though a typical paper, even a low-value one, generally takes a graduate student with specialized training about a year to complete, assuming the experimental apparatus and other necessary infrastructure are all in place. Not all science is data-driven either, science can also be observation-driven or theory-driven.
I disagree, in fact I actually think you can argue this development points the opposite direction, when you look at what they had to do to achieve it and the architecture they use.
I suggest you read Ernest Davis’ overview of Cicero. Cicero is a special-purpose system that took enormous work to produce—a team of multiple people labored on it for three years. They had to assemble a massive dataset from 125,300 online human games. They also had to get expert annotations on thousands of preliminary outputs. Even that was not enough.. they had to generate synthetic datasets as well to fix issues with the system! Even then, the dialogue module required a specialized filter to remove nonsense. This is a break from the scaling idea that says to solve new problems you just need to scale existing architectures to more parameters (and train on a large enough dataset).
Additionally, they argue that this system appears very unlikely to generalize to other problems, or even to slight modifications of the game of Diplomacy. It’s not even clear how well it would generalize to non-blitz games. If the rules were modified slightly, the entire system would likely have to be retrained.
I also want to point out that scientific research is not easy as you make it sound. Professors spend the bulk of their time writing proposals, so perhaps AI could help there by summarizing existing literature. Note though a typical paper, even a low-value one, generally takes a graduate student with specialized training about a year to complete, assuming the experimental apparatus and other necessary infrastructure are all in place. Not all science is data-driven either, science can also be observation-driven or theory-driven.