Sorry, just wanted to focus on one sentence close to the beginning:
We can barely multiply smallish multi-digit numbers together in our head, when in principle a reasoner could hold thousands of complex mathematical structures in its working memory simultaneously and perform complex operations on them.
Strangely enough, current LLMs have the exact same issue as humans: they guess the ballpark numerical answers reasonably well, but they are terrible at being precise. Be it drawing the right number of fingers, or writing a sentence with exactly 10 words, or multiplying 6-digit numbers, they behave like humans! Or maybe like many other animals, for whom accuracy is important, but precision is not.
What it looks like to me is suspiciously similar to human System 1 vs System 2. The latter is what you seem to count as “general intelligence”: the ability to reason and generalize outside the training distribution, if I understand it correctly. We can do it, albeit slowly and with greater effort. It looks like the current crop of AIs suffer from the same problem: their System 1 is what they excel at thanks to their training, like writing or drawing or even generating code. For some reason precise calculations are not built into the training sets, and so the models have a lot of trouble doing it.
Interestingly, like with humans using calculators, LLMs can apparently be augmented with something completely foreign, like, say, a Wolfram Alpha plugin, and learn to delegate specific kinds of “reasoning” to those augmentations. But, like humans, they do not learn much from using the augmentations, and revert to baseline capabilities without them.
The “System 1 vs System 2 domains” are not identical for humans and machines, but there is some overlap. It is also apparent that the newer models are better at “intuitive reasoning” about more topics than older ones, so maybe this is not a very useful model, at least not in the long term. But I can also imagine a world where some things that are hard for humans and require deliberate painstaking learning are also hard for machines and require similarly slow and effortful learning on top of the usual training… with potential implications for the AGI ruin scenarios.
Sorry, just wanted to focus on one sentence close to the beginning:
Strangely enough, current LLMs have the exact same issue as humans: they guess the ballpark numerical answers reasonably well, but they are terrible at being precise. Be it drawing the right number of fingers, or writing a sentence with exactly 10 words, or multiplying 6-digit numbers, they behave like humans! Or maybe like many other animals, for whom accuracy is important, but precision is not.
What it looks like to me is suspiciously similar to human System 1 vs System 2. The latter is what you seem to count as “general intelligence”: the ability to reason and generalize outside the training distribution, if I understand it correctly. We can do it, albeit slowly and with greater effort. It looks like the current crop of AIs suffer from the same problem: their System 1 is what they excel at thanks to their training, like writing or drawing or even generating code. For some reason precise calculations are not built into the training sets, and so the models have a lot of trouble doing it.
Interestingly, like with humans using calculators, LLMs can apparently be augmented with something completely foreign, like, say, a Wolfram Alpha plugin, and learn to delegate specific kinds of “reasoning” to those augmentations. But, like humans, they do not learn much from using the augmentations, and revert to baseline capabilities without them.
The “System 1 vs System 2 domains” are not identical for humans and machines, but there is some overlap. It is also apparent that the newer models are better at “intuitive reasoning” about more topics than older ones, so maybe this is not a very useful model, at least not in the long term. But I can also imagine a world where some things that are hard for humans and require deliberate painstaking learning are also hard for machines and require similarly slow and effortful learning on top of the usual training… with potential implications for the AGI ruin scenarios.
Similar to humans, LLMs can do 6-digit multiplication with sufficient prompting/structure!
https://www.lesswrong.com/posts/XvorpDSu3dwjdyT4f/gpt-4-multiplication-competition
Right… Which kind of fits with easy vs hard learning.