I strongly agree that this is a promising direction. It’s similar to the bet on supervising process we’re making at Ought.
In the terminology of this post, our focus is on creating externalized reasoners that are
authentic (reasoning is legible, complete, and causally responsible for the conclusions) and
competitive (results are as good or better than results by end-to-end systems).
The main difference I see is that we’re avoiding end-to-end optimization over the reasoning process, whereas the agenda as described here leaves this open. More specifically, we’re aiming for authenticity through factored cognition—breaking down reasoning into individual steps that don’t share the larger context—because:
it’s a way to enforce completeness and causal responsibility,
it scales to more complex tasks than append-only chain-of-thought style reasoning
Developing tools to automate the oversight of externalized reasoning.
Do you have more thoughts on what would be good to build here?
We’ve recently started making developer tools for our own use as we debug and oversee compositional reasoning. For example, we’re recording function calls that correspond to substeps of reasoning so that we can zoom in on steps and see what the inputs and outputs looked like, and where things went wrong. Applied to a decomposition for the task “Did this paper use a placebo? If so, what was it?”:
I strongly agree that this is a promising direction. It’s similar to the bet on supervising process we’re making at Ought.
In the terminology of this post, our focus is on creating externalized reasoners that are
authentic (reasoning is legible, complete, and causally responsible for the conclusions) and
competitive (results are as good or better than results by end-to-end systems).
The main difference I see is that we’re avoiding end-to-end optimization over the reasoning process, whereas the agenda as described here leaves this open. More specifically, we’re aiming for authenticity through factored cognition—breaking down reasoning into individual steps that don’t share the larger context—because:
it’s a way to enforce completeness and causal responsibility,
it scales to more complex tasks than append-only chain-of-thought style reasoning
Do you have more thoughts on what would be good to build here?
We’ve recently started making developer tools for our own use as we debug and oversee compositional reasoning. For example, we’re recording function calls that correspond to substeps of reasoning so that we can zoom in on steps and see what the inputs and outputs looked like, and where things went wrong. Applied to a decomposition for the task “Did this paper use a placebo? If so, what was it?”: