LLMs are evidence that abstract reasoning ability emerges as a side effect of solving any sufficiently hard and general problem with enough effort.
I’d probably not claim this, or at least significantly limit the claim here, because of some new results on Transformers/LLMs, and apparently they don’t actually do multi-step reasoning, but instead develop short cuts, and importantly can’t implement recursive algorithms, which is really important.
This is actually a fairly important limitation of LLMs, and appears to sort of vindicate the LLM skeptics like Yann Lecun and Gary Marcus, in that LLMs don’t actually reason on multi-step problems all that well.
It seems like it’s easy to break this limitation by writing prompts that break a problem into pieces, then calling a new instance of the LLM to solve each piece and then to provide the answer given the step by step reasoning from previous prompts. The SmartGPT does something like this, and achieves vastly better performance on the logical reasoning benchmarks it’s been tested on.
I’d probably not claim this, or at least significantly limit the claim here, because of some new results on Transformers/LLMs, and apparently they don’t actually do multi-step reasoning, but instead develop short cuts, and importantly can’t implement recursive algorithms, which is really important.
Tweets below:
AK on Twitter: “Faith and Fate: Limits of Transformers on Compositionality Transformer large language models (LLMs) have sparked admiration for their exceptional performance on tasks that demand intricate multi-step reasoning. Yet, these models simultaneously show failures on surprisingly… https://t.co/lsEfo9trPR” / Twitter
Talia Ringer @ FCRC on Twitter: “New preprint just dropped! “Can Transformers Learn to Solve Problems Recursively?” With @dylanszzhang, @CurtTigges, @BlancheMinerva, @mraginsky, and @TaliaRinger. https://t.co/D13mD2Q7aq https://t.co/wqM2FPQEQ4″ / Twitter
This is actually a fairly important limitation of LLMs, and appears to sort of vindicate the LLM skeptics like Yann Lecun and Gary Marcus, in that LLMs don’t actually reason on multi-step problems all that well.
It seems like it’s easy to break this limitation by writing prompts that break a problem into pieces, then calling a new instance of the LLM to solve each piece and then to provide the answer given the step by step reasoning from previous prompts. The SmartGPT does something like this, and achieves vastly better performance on the logical reasoning benchmarks it’s been tested on.