Like, for example, it makes sense that a future LLM would be able to explain a mathematical concept that has been documented and previously discussed but I just can’t see it solving existing frontier problems in mathematical theory, as it’s a completely different “skillset”.
Most non-mathmatician humans such as myself are arguably in the same boat for this specific example. I certainly wouldn’t know how to begin to work on frontier mathematical theory, but despite this if I were an AI I would fit many definitions of an AGI, albeit a lowly human-level one.
It would be useful to have a distinction between ‘routine’ logical reasoning (being able to apply established knowledge dynamically) and novel logical reasoning (being able to synthesize new theories/ideas from existing ones). LLMs can arguably do the first but not the second, however the line dividing the two can be hazy. Many “novel advances” are not ‘new’ as much as existing approaches from different domains being recombined to solve a different type of problem than they had previously been used for. Is this routine or novel? Could an LLM do this? Particularly if the problem were phased to be tabooed/abstractified to avoid using leading questions/keywords that would encourage the LLM to regurgitate the common approaches?
An impressive thing about LLMs is how well they can work with arbitrary problems, so long as the problems are not too far out of the training distribution.
An example: I asked ChatGPT 4o: ”A widget = 14 A phro = −3 A midge = 5 An attached wrok multiplies value by 0.5
What is the value of: A widget with an attached wrok plus three phros and two midges, and three wroks by themselves?”
Solving this sort of (simple to a human) word problem requires combining the domains of basic math and conceptual relations between objects (i.e. <object> can have <relation> to other <object> which interacts with the properties of that thing) and also the ability to make assumptions from missing information (no explicit value was assigned to solitary wroks)
GPT answer:
”To find the value of the given expression, let’s break it down step by step.
1. **Value of a widget with an attached wrok:** - A widget is worth 14. - An attached wrok multiplies the value by 0.5. - So, the value of a widget with an attached wrok is 14×0.5=7.
2. **Value of three phros:** - A phro is worth −3. - So, three phros are 3×−3=−9.
3. **Value of two midges:** - A midge is worth 5. - So, two midges are 2×5=10.
4. **Value of three wroks by themselves:** - Since a wrok by itself is not attached to anything and we don’t have a specified standalone value for a wrok, we can assume it doesn’t add any value by itself. So, three wroks by themselves are worth 3×0=0.
Now, we need to add all these values together:
7+(−9)+10+0
First, let’s handle the addition of 7 and −9:
7+(−9)=−2
Then, add the result to 10:
−2+10=8
And finally, add the result to 0:
8+0=8
So, the value of a widget with an attached wrok plus three phros and two midges, and three wroks by themselves is 8.”
Is this logical reasoning? Depends on how you define it. It’s certainly useful. How much of the thinking we humans do is ‘mere’ pattern matching with variable inputs? I suspect a large amount.
Most non-mathmatician humans such as myself are arguably in the same boat for this specific example. I certainly wouldn’t know how to begin to work on frontier mathematical theory, but despite this if I were an AI I would fit many definitions of an AGI, albeit a lowly human-level one.
It would be useful to have a distinction between ‘routine’ logical reasoning (being able to apply established knowledge dynamically) and novel logical reasoning (being able to synthesize new theories/ideas from existing ones).
LLMs can arguably do the first but not the second, however the line dividing the two can be hazy. Many “novel advances” are not ‘new’ as much as existing approaches from different domains being recombined to solve a different type of problem than they had previously been used for. Is this routine or novel? Could an LLM do this? Particularly if the problem were phased to be tabooed/abstractified to avoid using leading questions/keywords that would encourage the LLM to regurgitate the common approaches?
An impressive thing about LLMs is how well they can work with arbitrary problems, so long as the problems are not too far out of the training distribution.
An example:
I asked ChatGPT 4o:
”A widget = 14
A phro = −3
A midge = 5
An attached wrok multiplies value by 0.5
What is the value of: A widget with an attached wrok plus three phros and two midges, and three wroks by themselves?”
Solving this sort of (simple to a human) word problem requires combining the domains of basic math and conceptual relations between objects (i.e. <object> can have <relation> to other <object> which interacts with the properties of that thing) and also the ability to make assumptions from missing information (no explicit value was assigned to solitary wroks)
GPT answer:
”To find the value of the given expression, let’s break it down step by step.
1. **Value of a widget with an attached wrok:**
- A widget is worth 14.
- An attached wrok multiplies the value by 0.5.
- So, the value of a widget with an attached wrok is 14×0.5=7.
2. **Value of three phros:**
- A phro is worth −3.
- So, three phros are 3×−3=−9.
3. **Value of two midges:**
- A midge is worth 5.
- So, two midges are 2×5=10.
4. **Value of three wroks by themselves:**
- Since a wrok by itself is not attached to anything and we don’t have a specified standalone value for a wrok, we can assume it doesn’t add any value by itself. So, three wroks by themselves are worth 3×0=0.
Now, we need to add all these values together:
7+(−9)+10+0
First, let’s handle the addition of 7 and −9:
7+(−9)=−2
Then, add the result to 10:
−2+10=8
And finally, add the result to 0:
8+0=8
So, the value of a widget with an attached wrok plus three phros and two midges, and three wroks by themselves is 8.”
Is this logical reasoning? Depends on how you define it. It’s certainly useful. How much of the thinking we humans do is ‘mere’ pattern matching with variable inputs? I suspect a large amount.