I think you tied yourself too much to the strict binary classification that you invented (finetuning/scaffolding). You overgeneralise and your classification blocks the truth more than clarifies things.
All the different things that can be done by LLMs: tool use, scaffolded reasoning aka LM agents, RAG, fine-tuning, semantic knowledge graph mining, reasoning with semantic knowledge graph, finetuning for following “virtue” (persona, character, role, style, etc.), finetuning for model checking, finetuning for heuristics for theorem proving, finetuning for generating causal models, (what else?), just don’t easily fit into two simple categories with the properties that are consistent within the category.
But I don’t understand the sense in which you think finetuning in this context has completely different properties.
In the summary (note: I actually didn’t read the rest of the post, I’ve read only the summary), you write something that implies that finetuning is obscure or un-interpretable:
From a safety perspective, language model agents whose agency comes from scaffolding look greatly superior than ones whose agency comes from finetuning
Because you can get an extremely high degree of transparency by construction
But this totally doesn’t apply to these other variants of finetuning that I mentioned. If the LLM creates is a heuristic engine to generate mathematical proofs that are later verified with Lean, it just stops to make any sense to discuss how interpretable or transparent these theorem-proving or model-checking LLM-based heuristic engine.
I think you tied yourself too much to the strict binary classification that you invented (finetuning/scaffolding). You overgeneralise and your classification blocks the truth more than clarifies things.
All the different things that can be done by LLMs: tool use, scaffolded reasoning aka LM agents, RAG, fine-tuning, semantic knowledge graph mining, reasoning with semantic knowledge graph, finetuning for following “virtue” (persona, character, role, style, etc.), finetuning for model checking, finetuning for heuristics for theorem proving, finetuning for generating causal models, (what else?), just don’t easily fit into two simple categories with the properties that are consistent within the category.
In the summary (note: I actually didn’t read the rest of the post, I’ve read only the summary), you write something that implies that finetuning is obscure or un-interpretable:
But this totally doesn’t apply to these other variants of finetuning that I mentioned. If the LLM creates is a heuristic engine to generate mathematical proofs that are later verified with Lean, it just stops to make any sense to discuss how interpretable or transparent these theorem-proving or model-checking LLM-based heuristic engine.