That all seems pretty right to me. It continues to be difficult to fully define ‘general reasoning’, and my mental model of it continues to evolve, but I think of ‘system 2 reasoning’ as at least a partial synonym.
Humans clearly can do general reasoning. But it’s not easy for us.
It continues to be difficult to fully define ‘general reasoning’, and my mental model of it continues to evolve, but I think of ‘system 2 reasoning’ as at least a partial synonym.
In the medium-to-long term I’m inclined to taboo the word and talk about what I understand as its component parts, which I currently (off the top of my head) think of as something like:
The ability to do deduction, induction, and abduction.
The ability to do those in a careful, step by step way, with almost no errors (other than the errors that are inherent to induction and abduction on limited data).
The ability to do all of that in a domain-independent way.
The ability to use all of that to build a self-consistent internal model of the domain under consideration.
Don’t hold me to that, though, it’s still very much evolving. I may do a short-form post with just the above to invite critique.
I like trying to define general reasoning; I also don’t have a good definition. I think it’s tricky.The ability to do deduction, induction, and abduction.
The ability to do deduction, induction, and abduction.
I think you’ve got to define how well it does each of these. As you noted on that very difficult math benchmark comment, saying they can do general reasoning doesn’t mean doing it infinitely well.
The ability to do those in a careful, step by step way, with almost no errors (other than the errors that are inherent to induction and abduction on limited data).
I don’t know about this one. Humans seem to make a very large number of errors, but muddle through by recognizing at above-chance levels when they’re more likely to be correct—then building on that occasional success. So I think there are two routes to useful general-purpose reasoning—doing it well, or being able to judge success at above-chance and then remember it for future use one way or another.
The ability to do all of that in a domain-independent way.
The ability to use all of that to build a self-consistent internal model of the domain under consideration.
Here again, I think we shouldn’t overstimate how self-consistent or complete a model humans use when they make progress on difficult problems. It’s consistent and complete enough, but probably far from perfect.
That all seems pretty right to me. It continues to be difficult to fully define ‘general reasoning’, and my mental model of it continues to evolve, but I think of ‘system 2 reasoning’ as at least a partial synonym.
Agreed; not only are we very limited at it, but we often aren’t doing it at all.
I agree that it may be possible to achieve it with scaffolding even if LLMs don’t get there on their own; I’m just less certain of it.
In the medium-to-long term I’m inclined to taboo the word and talk about what I understand as its component parts, which I currently (off the top of my head) think of as something like:
The ability to do deduction, induction, and abduction.
The ability to do those in a careful, step by step way, with almost no errors (other than the errors that are inherent to induction and abduction on limited data).
The ability to do all of that in a domain-independent way.
The ability to use all of that to build a self-consistent internal model of the domain under consideration.
Don’t hold me to that, though, it’s still very much evolving. I may do a short-form post with just the above to invite critique.
I like trying to define general reasoning; I also don’t have a good definition. I think it’s tricky.The ability to do deduction, induction, and abduction.
I think you’ve got to define how well it does each of these. As you noted on that very difficult math benchmark comment, saying they can do general reasoning doesn’t mean doing it infinitely well.
I don’t know about this one. Humans seem to make a very large number of errors, but muddle through by recognizing at above-chance levels when they’re more likely to be correct—then building on that occasional success. So I think there are two routes to useful general-purpose reasoning—doing it well, or being able to judge success at above-chance and then remember it for future use one way or another.
Here again, I think we shouldn’t overstimate how self-consistent or complete a model humans use when they make progress on difficult problems. It’s consistent and complete enough, but probably far from perfect.
Yeah, very fair point, those are at least in part defining a scale rather than a threshold (especially the error-free and consistent-model criteria).