A new kind of thing often only finds its natural role once it becomes instantiated as many tiny gears in a vast machine, and people get experience with various designs of the machines that make use of it. Calling an arrangement of LLM calls a “Scaffolded LLM” is like calling a computer program running on an OS a “Scaffolded system call”. A program is not primarily about system calls it uses to communicate with the OS, and a “Scaffolded LLM” is not primarily about LLMs it uses to implement many of its subroutines. It’s more of a legible/interpretable/debuggable cognitive architecture, a program in the usual sense that describes what the whole thing does, and only incidentally does it need to make use of unreliable reasoning engines that are LLMs to take magical reasoning steps.
(A relevant reference that seems to be missing is Conjecture’s Cognitive Emulation (CoEm) proposal, which seems to fit as an example of a “Scaffolded LLM”, and is explicitly concerned with minimizing reliance of properties of LLM invocations it would need to function.)
Thank you for the feedback. I’m definitely not sold on any particular terminology and was just aiming to keep things as compatible as possible with existing work.
I wasn’t that familiar with Conjecture’s work on CoEm, although I had read that outline. It was not immediately obvious to me that their work involved LLMs.
More details on CoEm currently seem to be scattered across various podcasts with Connor Leahy, though a writeup might eventually materialize. I like this snippet (4 minutes, starting at 49:21).
On the terminology front, I’m suggesting language model agent (LMA) as a clear reference to LLMs and to their extension. Language model cognitive architecture (LMCA) is more precise but less intuitive. I’m suggesting LLM+ for the whole broad category of additions to LLMs, including tools and agentic script wrappers like AutoGPT.
A new kind of thing often only finds its natural role once it becomes instantiated as many tiny gears in a vast machine, and people get experience with various designs of the machines that make use of it. Calling an arrangement of LLM calls a “Scaffolded LLM” is like calling a computer program running on an OS a “Scaffolded system call”. A program is not primarily about system calls it uses to communicate with the OS, and a “Scaffolded LLM” is not primarily about LLMs it uses to implement many of its subroutines. It’s more of a legible/interpretable/debuggable cognitive architecture, a program in the usual sense that describes what the whole thing does, and only incidentally does it need to make use of unreliable reasoning engines that are LLMs to take magical reasoning steps.
(A relevant reference that seems to be missing is Conjecture’s Cognitive Emulation (CoEm) proposal, which seems to fit as an example of a “Scaffolded LLM”, and is explicitly concerned with minimizing reliance of properties of LLM invocations it would need to function.)
Thank you for the feedback. I’m definitely not sold on any particular terminology and was just aiming to keep things as compatible as possible with existing work.
I wasn’t that familiar with Conjecture’s work on CoEm, although I had read that outline. It was not immediately obvious to me that their work involved LLMs.
More details on CoEm currently seem to be scattered across various podcasts with Connor Leahy, though a writeup might eventually materialize. I like this snippet (4 minutes, starting at 49:21).
On the terminology front, I’m suggesting language model agent (LMA) as a clear reference to LLMs and to their extension. Language model cognitive architecture (LMCA) is more precise but less intuitive. I’m suggesting LLM+ for the whole broad category of additions to LLMs, including tools and agentic script wrappers like AutoGPT.
This is probably worth a whole post, but FWIW.