That’s an interesting argument. However, something similar to your hypothetical explanation in footnote 6 suggests the following hypothesis: Most humans aren’t optimized by evolution to be good at abstract physics reasoning, while they easily could have been, with evolutionary small changes in hyperparameters. After all Einstein wasn’t too dissimilar in training/inference compute and architecture from the rest of us. This explanation seems somewhat plausible, since highly abstract reasoning ability perhaps wasn’t very useful for most of human history.
(An argument in a similar direction is the existence of Savant syndrome, which implies that quite small differences in brain hyperparameters can lead to strongly increased narrow capabilities of some form, which likely weren’t useful in the ancestral environment, which explains why humans generally don’t have them. The Einstein case suggests a similar phenomenon may also exists for more general abstract reasoning.)
If this is right, humans would be analogous to very strong base LLMs with poor instruction tuning, where the instruction tuning (for example) only involved narrow instruction-execution pairs that are more or less directly related to finding food in the wilderness, survival and reproduction. Which would lead to bad performance at many tasks not closely related to fitness, e.g. on Math benchmarks. The point is that a lot of the “raw intelligence” of the base LLM couldn’t be accessed just because the model wasn’t tuned to be good at diverse abstract tasks, even though it easily could have been, without a big change in architecture or training/inference compute.
But then it seems unlikely that artificial ML models (like LLMs) are or will be unoptimized for highly abstract reasoning in the same way evolution apparently didn’t “care” to make us all great at abstract physics and math style thinking. Since AI models are indeed actively optimized in diverse abstract directions. Which would make it unlikely to get a large capability jump (analogous to Einstein or von Neumann) just from tweaking the hyperparameters a bit, since those are probably pretty optimized already.
If this explanation is assumed to be true, it would mean we shouldn’t expect sudden large (Einstein-like) capability gains once AI models reach Einstein-like ability.
The (your) alternative explanation is that there is indeed at some point a phase transition at a certain intelligence level, which leads to big gains just from small tweaks in hyperparameters. Perhaps because of something like the “grokking cascade” you mentioned. That would mean Einstein wasn’t so good at physics because he happened to be, unlike most humans, “optimized for abstract reasoning”, but because he reached an intelligence level where some grokking cascade, or something like that, occurs naturally. Then indeed a similar thing could easily happen for AI at some point.
Some evidence in favor of your explanation (being at least a correct partial explanation):
von Neuman apparently envied Einstein’s physics intuitions, while Einstein lacked von Neuman’s math skills. This seems to suggest that they were “tuned” in slightly different directions.
Neither of the two seem superhumanly accomplished in other areas (that a smart person/agent might have goals for), such as making money, moral/philosophical progress, changing culture/politics in their preferred direction.
(An alternative explanation for 2 is that they could have been superhuman in other areas but their terminal goals did not chain through instrumental goals in those areas, which in turn raises the question of what those terminal goals must have been for this explanation to be true and what that says about human values.)
I note that under your explanation, someone could surprise the world by tuning a not-particularly-advanced AI for a task nobody previously thought to tune AI for, or by inventing a better tuning method (either general or specialized), thus achieving a large capability jump in one or more domains. Not sure how worrisome this is though.
Von Neumann himself was perpetually interested in many fields unrelated to science. Several years ago his wife gave him a 21-volume Cambridge History set, and she is sure he memorized every name and fact in the books. “He is a major expert on all the royal family trees in Europe,” a friend said once. “He can tell you who fell in love with whom, and why, what obscure cousin this or that czar married, how many illegitimate children he had and so on.” One night during the Princeton days a world-famous expert on Byzantine history came to the Von Neumann house for a party. “Johnny and the professor got into a corner and began discussing some obscure facet,” recalls a friend who was there. “Then an argument arose over a date. Johnny insisted it was this, the professor that. So Johnny said, ‘Let’s get the book.’ They looked it up and Johnny was right. A few weeks later the professor was invited to the Von Neumann house again. He called Mrs. von Neumann and said jokingly, ‘I’ll come if Johnny promises not to discuss Byzantine history. Everybody thinks I am the world’s greatest expert in it and I want them to keep on thinking that.’”
____
According to the same article, he was not such a great driver.
Now, comparing him to another famous figure of his age, Menachem Mendel Schneerson. Schneerson was legendary for his ability to recall obscure sections of Torah verbatim, and his insightful reasoning (I am speaking lightly here, his impact was incredible). Using the hypothetical that von Neumann and Schneerson had a similar gift (their ability with the written word as a reflection of their general ability), depending on your worldview, Schneerson’s talents were not properly put to use in the service of science, or von Neumann’s talents were wasted in not becoming a gaon.
Perhaps, if von Neumann had engaged in Torah instead of science, we could have been spared nuclear weapons and maybe even AI for some time. Sure, maybe someone else would have done what he did...but who?
That’s an interesting argument. However, something similar to your hypothetical explanation in footnote 6 suggests the following hypothesis: Most humans aren’t optimized by evolution to be good at abstract physics reasoning, while they easily could have been, with evolutionary small changes in hyperparameters. After all Einstein wasn’t too dissimilar in training/inference compute and architecture from the rest of us. This explanation seems somewhat plausible, since highly abstract reasoning ability perhaps wasn’t very useful for most of human history.
(An argument in a similar direction is the existence of Savant syndrome, which implies that quite small differences in brain hyperparameters can lead to strongly increased narrow capabilities of some form, which likely weren’t useful in the ancestral environment, which explains why humans generally don’t have them. The Einstein case suggests a similar phenomenon may also exists for more general abstract reasoning.)
If this is right, humans would be analogous to very strong base LLMs with poor instruction tuning, where the instruction tuning (for example) only involved narrow instruction-execution pairs that are more or less directly related to finding food in the wilderness, survival and reproduction. Which would lead to bad performance at many tasks not closely related to fitness, e.g. on Math benchmarks. The point is that a lot of the “raw intelligence” of the base LLM couldn’t be accessed just because the model wasn’t tuned to be good at diverse abstract tasks, even though it easily could have been, without a big change in architecture or training/inference compute.
But then it seems unlikely that artificial ML models (like LLMs) are or will be unoptimized for highly abstract reasoning in the same way evolution apparently didn’t “care” to make us all great at abstract physics and math style thinking. Since AI models are indeed actively optimized in diverse abstract directions. Which would make it unlikely to get a large capability jump (analogous to Einstein or von Neumann) just from tweaking the hyperparameters a bit, since those are probably pretty optimized already.
If this explanation is assumed to be true, it would mean we shouldn’t expect sudden large (Einstein-like) capability gains once AI models reach Einstein-like ability.
The (your) alternative explanation is that there is indeed at some point a phase transition at a certain intelligence level, which leads to big gains just from small tweaks in hyperparameters. Perhaps because of something like the “grokking cascade” you mentioned. That would mean Einstein wasn’t so good at physics because he happened to be, unlike most humans, “optimized for abstract reasoning”, but because he reached an intelligence level where some grokking cascade, or something like that, occurs naturally. Then indeed a similar thing could easily happen for AI at some point.
I’m not sure which explanation is better.
Some evidence in favor of your explanation (being at least a correct partial explanation):
von Neuman apparently envied Einstein’s physics intuitions, while Einstein lacked von Neuman’s math skills. This seems to suggest that they were “tuned” in slightly different directions.
Neither of the two seem superhumanly accomplished in other areas (that a smart person/agent might have goals for), such as making money, moral/philosophical progress, changing culture/politics in their preferred direction.
(An alternative explanation for 2 is that they could have been superhuman in other areas but their terminal goals did not chain through instrumental goals in those areas, which in turn raises the question of what those terminal goals must have been for this explanation to be true and what that says about human values.)
I note that under your explanation, someone could surprise the world by tuning a not-particularly-advanced AI for a task nobody previously thought to tune AI for, or by inventing a better tuning method (either general or specialized), thus achieving a large capability jump in one or more domains. Not sure how worrisome this is though.
As it turns out, von Neumann was good at lots of things.
https://qualiacomputing.com/2018/06/21/john-von-neumann/
Von Neumann himself was perpetually interested in many fields unrelated to science. Several years ago his wife gave him a 21-volume Cambridge History set, and she is sure he memorized every name and fact in the books. “He is a major expert on all the royal family trees in Europe,” a friend said once. “He can tell you who fell in love with whom, and why, what obscure cousin this or that czar married, how many illegitimate children he had and so on.” One night during the Princeton days a world-famous expert on Byzantine history came to the Von Neumann house for a party. “Johnny and the professor got into a corner and began discussing some obscure facet,” recalls a friend who was there. “Then an argument arose over a date. Johnny insisted it was this, the professor that. So Johnny said, ‘Let’s get the book.’ They looked it up and Johnny was right. A few weeks later the professor was invited to the Von Neumann house again. He called Mrs. von Neumann and said jokingly, ‘I’ll come if Johnny promises not to discuss Byzantine history. Everybody thinks I am the world’s greatest expert in it and I want them to keep on thinking that.’”
____
According to the same article, he was not such a great driver.
Now, comparing him to another famous figure of his age, Menachem Mendel Schneerson. Schneerson was legendary for his ability to recall obscure sections of Torah verbatim, and his insightful reasoning (I am speaking lightly here, his impact was incredible). Using the hypothetical that von Neumann and Schneerson had a similar gift (their ability with the written word as a reflection of their general ability), depending on your worldview, Schneerson’s talents were not properly put to use in the service of science, or von Neumann’s talents were wasted in not becoming a gaon.
Perhaps, if von Neumann had engaged in Torah instead of science, we could have been spared nuclear weapons and maybe even AI for some time. Sure, maybe someone else would have done what he did...but who?