Agreed, the competitiveness penalty from enforcing internal legibility is the main concern with externalized reasoning / factored cognition. The secular trend in AI systems is towards end-to-end training and human-uninterpretable intermediate representations; while you can always do slightly better at the frontier by adding some human-understandable components like chain of thought (previously beam search & probabilistic graphical models), in the long run a bigger end-to-end model will win out.
One hope that “externalized reasoning” can buck this trend rests on the fact that success in “particularly legible domains, such as math proofs and programming” is actually enough for transformative AI—thanks to the internet and especially the rise of remote work, so much of the economy is legible. Sure, your nuclear-fusion-controller AI will have a huge competitiveness penalty if you force it to explain what it’s doing in natural language, but physical control isn’t where we’ve seen AI successes anyway.
Side note:
standard training procedures only incentivize the model to use reasoning steps produced by a single human.
I don’t think this is right! The model will have seen enough examples of dialogue and conversation transcripts; it can definitely generate outputs that involve multiple domains of knowledge from prompts like
An economist and a historian are debating the causes of WW2.
in the “economist and historian” case, it will only synthesize their knowledge together as much as those humans would, and humans are pretty suboptimal at integrating others’ opinions.
Agreed, the competitiveness penalty from enforcing internal legibility is the main concern with externalized reasoning / factored cognition. The secular trend in AI systems is towards end-to-end training and human-uninterpretable intermediate representations; while you can always do slightly better at the frontier by adding some human-understandable components like chain of thought (previously beam search & probabilistic graphical models), in the long run a bigger end-to-end model will win out.
One hope that “externalized reasoning” can buck this trend rests on the fact that success in “particularly legible domains, such as math proofs and programming” is actually enough for transformative AI—thanks to the internet and especially the rise of remote work, so much of the economy is legible. Sure, your nuclear-fusion-controller AI will have a huge competitiveness penalty if you force it to explain what it’s doing in natural language, but physical control isn’t where we’ve seen AI successes anyway.
Side note:
I don’t think this is right! The model will have seen enough examples of dialogue and conversation transcripts; it can definitely generate outputs that involve multiple domains of knowledge from prompts like
in the “economist and historian” case, it will only synthesize their knowledge together as much as those humans would, and humans are pretty suboptimal at integrating others’ opinions.