Noosphere89 comments on eggsyntax’s Shortform

Noosphere89 17 Sep 2024 15:20 UTC
2 points
0

Again, it’s unambiguously doing search

This is the big takeaway here, and my main takeaway is that search is a notable capabilities improvement on it’s own, but still needs compute scaling to get better results.

But the other takeaway is that based on it’s performance in several benchmarks, I think that it turns out that adding search was way easier than Francois Chollet thought it would, and it’s looking like the compute and data are the hard parts of getting intelligence into LLMs, not the search and algorithm parts.

This is just another point on the trajectory of LLMs being more and more general reasoners, and not just memorizing their training data.
- eggsyntax 17 Sep 2024 15:36 UTC
  7 points
  0
  Parent
  I was just amused to see a tweet from Subbarao Kambhampati in which he essentially speculates that o1 is doing search and planning in a way similar to AlphaGo...accompanied by a link to his ‘LLMs Can’t Plan’ paper.
  I think we’re going to see some goalpost-shifting from a number of people in the ‘LLMs can’t reason’ camp.
  - Noosphere89 17 Sep 2024 15:54 UTC
    2 points
    0
    Parent
    I agree with this, and I think that o1 is clearly a case where a lot of people will try to shift the goalposts even as AI gets more and more capable and runs more and more of the economy.
    
    It’s looking like the hard part isn’t the algorithmic or data parts, but the compute part of AI.
- cubefox 17 Sep 2024 16:27 UTC
  2 points
  1
  Parent
  
  This is the first model where we have strong evidence that the LLM is actually reasoning/generalizing and not just memorizing it’s data.
  
  Really? There were many examples where even GPT-3 solved simple logic problems which couldn’t be explained with having the solution memorized. The effectiveness of chain of thought prompting was discovered when GPT-3 was current. GPT-4 could do fairly advanced math problems, explain jokes etc.
  
  The o1-preview model exhibits a substantive improvement in CoT reasoning, but arguably not something fundamentally different.
  - Noosphere89 17 Sep 2024 16:28 UTC
    2 points
    0
    Parent
    True enough, and I should probably rewrite the claim.
    
    Though what was the logic problem that was solved without memorization.
    - cubefox 17 Sep 2024 16:43 UTC
      2 points
      0
      Parent
      I don’t remember exactly, but there were debates (e.g. involving Gary Marcus) on whether GPT-3 was merely a stochastic parrot or not, based on various examples. The consensus here was that it wasn’t. For one, if it was all just memorization, then CoT prompting wouldn’t have provided any improvement, since CoT imitates natural language reasoning, not a memorization technique.
      - Noosphere89 17 Sep 2024 16:46 UTC
        2 points
        0
        Parent
        Yeah, it’s looking like GPT-o1 is just quantitatively better at generalizing compared to GPT-3, not qualitatively better.