Thank you for putting in the effort required to review this. Post like this help a lot in interpreting hyped literature. I am also skeptical myself whether LLMs are the path to AGI, and likely would have counted the paper as an additional (small) datapoint in favour of my conclusion if not for your detailed summary (I did not read the original paper myself and had no intention to, hence only ‘small’ data point).
“There’s a common assumption in many LLM critiques that reasoning ability is binary: either you have it, or you don’t.”
I agree, and would even push it further: I think the crux of the whole issue is our lack of good understanding of the concept space we refer to with words such as “reasoning”, “thinking”, “intelligence” or “agency”.
I do have a hunch that we have some inherent limitations while trying to use our own “reasoning”/”intelligence”/etc to understand this space itself (something like Gödel’s incompleteness theorem), but I do not have a proof. Than again, maybe not and we will figure it out.
Whatever the case is, we are not good at it right now. I imagine an analogy for this if we had the same (lack of) understanding for moving around in physical space:
Car-3.5 is invented, and is being used to carry things from one location to another. Some people claim it cannot “move on its own”, as Car-3.5 could not move over hill #1 or muddy field #1, so it is just a road following engine, not a general artificial mover. Car-4 is created, with stronger engine, being able to climb over hill #1, Car-o1 has better transmission and wheels, being able to cross muddy field #1. It still cannot cross hill #2 or creek #1, so some people claim again, that it cannot actually move on its own. Other people show that just increasing engine power and doing tricks like adjusting wheel structure will help overcome this, and even most humans would be unable to cross hill #2 or creek #1, we would just go along the roads and use the tunnel or bridge to cross these, just as Car-o1 does it. Are we not general movers after all either? Do we need to increase only engine power and get some scaffolding in place for creek crossing to get a something that can move at least as well as a human in all spaces?
Replace the “Car-” string with “Humanoid_robot-”, and think about it again. Change back to “Car-” but imagine this is the thought process of a horse, and think about it again.
We do not know which of the three variant describes our situation best.
Adding context/(kind-of) counter argument from reddit (the link has a link to the main article and contains a summary of it):
https://www.reddit.com/r/CredibleDefense/comments/1ll7ypj/article_i_fought_in_ukraine_and_heres_why_fpv/
I think the comments are also worth a read. I want to share one particular comment here, which I think has a good explanation/hypothesis regarding the situation: