eggsyntax comments on LLMs Look Increasingly Like General Reasoners

eggsyntax 9 Nov 2024 13:10 UTC
5 points
0
That all seems pretty right to me. It continues to be difficult to fully define ‘general reasoning’, and my mental model of it continues to evolve, but I think of ‘system 2 reasoning’ as at least a partial synonym.
Humans clearly can do general reasoning. But it’s not easy for us.
Agreed; not only are we very limited at it, but we often aren’t doing it at all.
So here’s my guess on whether LLMs reach general reasoning by pure scaling: it doesn’t matter.
I agree that it may be possible to achieve it with scaffolding even if LLMs don’t get there on their own; I’m just less certain of it.
- eggsyntax 9 Nov 2024 15:06 UTC
  3 points
  0
  Parent
  It continues to be difficult to fully define ‘general reasoning’, and my mental model of it continues to evolve, but I think of ‘system 2 reasoning’ as at least a partial synonym.
  In the medium-to-long term I’m inclined to taboo the word and talk about what I understand as its component parts, which I currently (off the top of my head) think of as something like:
  - The ability to do deduction, induction, and abduction.
  - The ability to do those in a careful, step by step way, with almost no errors (other than the errors that are inherent to induction and abduction on limited data).
  - The ability to do all of that in a domain-independent way.
  - The ability to use all of that to build a self-consistent internal model of the domain under consideration.
  Don’t hold me to that, though, it’s still very much evolving. I may do a short-form post with just the above to invite critique.
  - Seth Herd 9 Nov 2024 22:41 UTC
    3 points
    0
    Parent
    I like trying to define general reasoning; I also don’t have a good definition. I think it’s tricky.The ability to do deduction, induction, and abduction.
    The ability to do deduction, induction, and abduction.
    I think you’ve got to define how well it does each of these. As you noted on that very difficult math benchmark comment, saying they can do general reasoning doesn’t mean doing it infinitely well.
    The ability to do those in a careful, step by step way, with almost no errors (other than the errors that are inherent to induction and abduction on limited data).
    I don’t know about this one. Humans seem to make a very large number of errors, but muddle through by recognizing at above-chance levels when they’re more likely to be correct—then building on that occasional success. So I think there are two routes to useful general-purpose reasoning—doing it well, or being able to judge success at above-chance and then remember it for future use one way or another.
    The ability to do all of that in a domain-independent way.
    The ability to use all of that to build a self-consistent internal model of the domain under consideration.
    Here again, I think we shouldn’t overstimate how self-consistent or complete a model humans use when they make progress on difficult problems. It’s consistent and complete enough, but probably far from perfect.
    - eggsyntax 10 Nov 2024 0:34 UTC
      7 points
      0
      Parent
      Yeah, very fair point, those are at least in part defining a scale rather than a threshold (especially the error-free and consistent-model criteria).