Lots of important phenomena have a critical threshold. In nuclear weapons, a certain number of neutrons are produced by each fission event and some of those trigger more events. If the number of events triggered is slightly more, the result grows exponential. If slightly less, much less happens.
Similarly in Quantum Computing. Current computers struggle with quantum noise, which causes the superposition to break down over time. However, if we can keep the error rate low enough, it should be possible to use error-correcting codes to do arbitrarily complicated calculations.
When trying to extend LLMs to difficult multi-step problems, I often feel like I’m dealing with a similar phenomena. For example, if asking an LLM to write a novel, it will follow the plot of of the novel for a while and then spontaneously jump to a different story. It feels like the “amount of information” passed from one state to enough is not-quite-enough to keep the story going indefinitely. LLM Agents struggle with similar problems where they seem to work for a while, but after a while they get stuck in a loop or lose their train of thought.
It seems like there are two ways this behavior could change as we scale up LLMs:
LLMs get gradually better as we increase their capabilities (they go from being able write 1 page to writing 2 to writing 3...)
There is some “critical size” threshold above which agents are able to self-improve without limit and suddenly we go from writing pages to writing entire encyclopedias.
Does anyone know of good evidence for/against either of these cases? (the strongest evidence in favor of 1 seems to be “that’s how it’s gone so far”)
I think that if we retain the architecture of current LLMs, we will be in world one. I have two reasons.
First, the architecture of current LLMs place a limit on how much information they can retain about the task at hand. They have memory of a prompt (both the system prompt and your task-specific prompt) plus the memory of everything they’ve said so far. When what they’ve said so far gets long enough, they attend mostly to what they’ve already said, rather than attending to the prompt. Then they wander off into La-La land.
Second, the problem may also be inherent in their training methods. In the first (and largest) part of their training, they’re trained to predict the next word from a snippet of English text. A few years ago, these snippets were a sentence or a paragraph. They’ve gotten longer recently, but I don’t think they amount to entire books yet (readers, please tell us if you know). So it’s never seen a text that’s coherent over longer than its snippet length. It seems unsurprising that it doesn’t know how to remain coherent indefinitely.
People have tried preventing these phenomena by various schemes, such as telling the LLM to prepare summaries for later expansion, or periodically reminding it of the task at hand. So far these haven’t been enough to make indefinitely long tasks feasible. Of course, there are lots of smart people working on this, and we could transition from world one to world two at any moment.