I like trying to define general reasoning; I also don’t have a good definition. I think it’s tricky.The ability to do deduction, induction, and abduction.
The ability to do deduction, induction, and abduction.
I think you’ve got to define how well it does each of these. As you noted on that very difficult math benchmark comment, saying they can do general reasoning doesn’t mean doing it infinitely well.
The ability to do those in a careful, step by step way, with almost no errors (other than the errors that are inherent to induction and abduction on limited data).
I don’t know about this one. Humans seem to make a very large number of errors, but muddle through by recognizing at above-chance levels when they’re more likely to be correct—then building on that occasional success. So I think there are two routes to useful general-purpose reasoning—doing it well, or being able to judge success at above-chance and then remember it for future use one way or another.
The ability to do all of that in a domain-independent way.
The ability to use all of that to build a self-consistent internal model of the domain under consideration.
Here again, I think we shouldn’t overstimate how self-consistent or complete a model humans use when they make progress on difficult problems. It’s consistent and complete enough, but probably far from perfect.
I like trying to define general reasoning; I also don’t have a good definition. I think it’s tricky.The ability to do deduction, induction, and abduction.
I think you’ve got to define how well it does each of these. As you noted on that very difficult math benchmark comment, saying they can do general reasoning doesn’t mean doing it infinitely well.
I don’t know about this one. Humans seem to make a very large number of errors, but muddle through by recognizing at above-chance levels when they’re more likely to be correct—then building on that occasional success. So I think there are two routes to useful general-purpose reasoning—doing it well, or being able to judge success at above-chance and then remember it for future use one way or another.
Here again, I think we shouldn’t overstimate how self-consistent or complete a model humans use when they make progress on difficult problems. It’s consistent and complete enough, but probably far from perfect.
Yeah, very fair point, those are at least in part defining a scale rather than a threshold (especially the error-free and consistent-model criteria).