This is the first model where we have strong evidence that the LLM is actually reasoning/generalizing and not just memorizing it’s data.
Really? There were many examples where even GPT-3 solved simple logic problems which couldn’t be explained with having the solution memorized. The effectiveness of chain of thought prompting was discovered when GPT-3 was current. GPT-4 could do fairly advanced math problems, explain jokes etc.
The o1-preview model exhibits a substantive improvement in CoT reasoning, but arguably not something fundamentally different.
I don’t remember exactly, but there were debates (e.g. involving Gary Marcus) on whether GPT-3 was merely a stochastic parrot or not, based on various examples. The consensus here was that it wasn’t. For one, if it was all just memorization, then CoT prompting wouldn’t have provided any improvement, since CoT imitates natural language reasoning, not a memorization technique.
Really? There were many examples where even GPT-3 solved simple logic problems which couldn’t be explained with having the solution memorized. The effectiveness of chain of thought prompting was discovered when GPT-3 was current. GPT-4 could do fairly advanced math problems, explain jokes etc.
The o1-preview model exhibits a substantive improvement in CoT reasoning, but arguably not something fundamentally different.
True enough, and I should probably rewrite the claim.
Though what was the logic problem that was solved without memorization.
I don’t remember exactly, but there were debates (e.g. involving Gary Marcus) on whether GPT-3 was merely a stochastic parrot or not, based on various examples. The consensus here was that it wasn’t. For one, if it was all just memorization, then CoT prompting wouldn’t have provided any improvement, since CoT imitates natural language reasoning, not a memorization technique.
Yeah, it’s looking like GPT-o1 is just quantitatively better at generalizing compared to GPT-3, not qualitatively better.