I think current LLM have recurrence as the generated tokens are input to the next pass of the DNN.
From observations I see that they work better on tasks of planning, inference or even writing the program code if they start off with step by step “thinking out loud” explaining steps of the plan, of inference or of details of code to write. If you ask GPT-4 for something not trivial and substantially different from code that can be found in public repositories it will tend to write plan first. If you ask it in different thread to make the code only without description, then usually first solution with a bit of planning is better and less erroneous than the second one. They also work much better if you specify simple steps to translate into code instead of more abstract description (in case of writing code without planing). This suggests LLM don’t have ability to internally span long tree of possibilities to check—like a direct agent—but they can use recurrence of token output-input to do some similar work.
The biggest difference here that I see is that:
direct optimization processes are ultra-fast in searching the tree of possibilities but not fast computationally to discriminate good enough solutions from worse ones
on the other hand amortised optimizers are very slow if direct search is needed. Maybe right now with GPT-4 a bit faster than regular human, but also a bit more erroneous, especially in more complex inference.
amortised optimizers are faster for quickly finding good enough solution by some generalized heuristics, without need for direct search (or only small amount of it on higher abstraction level)
amortised optimizers like LLMs can group steps or outcomes into more abstract groups like humans do and work on those groups instead of direct every possible action and outcome
What I’m more worried about is more close hybridization between direct and amortised optimizers. I can imagine architecture where there is a direct optimizer but instead of generating and searching impossibly vast tree of possibilities it would use a DNN model for generation of less options. Like instead of generating thosands detailed moves like “move 5 meters”, “take that thing”, “put it there” and optimize over that, generate more abstract plan points specified by LLM with predictions of that step outcome and then evaluate how that outcome works for the goal. This way it could plan on more abstract level like humans to narrow down general plan or list of partial goals that lead to “final goal” or to “best path” (if it’s value function is more like integral over time instead of one final target). Find a good strategy. With enough time—it might be even a complex and indirect one. Then it could plan tactics for first step in the same way but on the lower abstraction. Then plan direct move step to realise first step of current tactics and run it. It might have several subprocesses that asynchronously work out strategy based on general state and goal, current tactics based on more detailed state and current strategical goal to pursue, current moves based on current tactical plan. With any numbers of abstraction and detail levels (2-3 seems like typical for humans, but AI might have more). This kind of agent might behave more like direct optimizer, even if using LLM and DNN inside for some parts. Direct optimization would have a first seat behind steering wheel in such agent.
I don’t think this will be outcome of research at OpenAI or other such laboratories any time soon. It might be, but if I would guess then I think it would be rather LLM or other DNN model “on top” that is connected to other models to “use at will”. For example it is rather easy to connect GPT-4 so it could use other models or APIs (like database, search). So this is very low hanging fruit for current AI development. I see that next step will be connecting it to more modalities and other models. It is currently going on.
I think though, this more direct agent might be the outcome of works done by military. Direct approach is much more reliable and reliability is one of the top key values for military-grade equipment. I only hope they will take the danger of such approach seriously.
I think current LLM have recurrence as the generated tokens are input to the next pass of the DNN.
From observations I see that they work better on tasks of planning, inference or even writing the program code if they start off with step by step “thinking out loud” explaining steps of the plan, of inference or of details of code to write. If you ask GPT-4 for something not trivial and substantially different from code that can be found in public repositories it will tend to write plan first. If you ask it in different thread to make the code only without description, then usually first solution with a bit of planning is better and less erroneous than the second one. They also work much better if you specify simple steps to translate into code instead of more abstract description (in case of writing code without planing). This suggests LLM don’t have ability to internally span long tree of possibilities to check—like a direct agent—but they can use recurrence of token output-input to do some similar work.
The biggest difference here that I see is that:
direct optimization processes are ultra-fast in searching the tree of possibilities but not fast computationally to discriminate good enough solutions from worse ones
on the other hand amortised optimizers are very slow if direct search is needed. Maybe right now with GPT-4 a bit faster than regular human, but also a bit more erroneous, especially in more complex inference.
amortised optimizers are faster for quickly finding good enough solution by some generalized heuristics, without need for direct search (or only small amount of it on higher abstraction level)
amortised optimizers like LLMs can group steps or outcomes into more abstract groups like humans do and work on those groups instead of direct every possible action and outcome
What I’m more worried about is more close hybridization between direct and amortised optimizers. I can imagine architecture where there is a direct optimizer but instead of generating and searching impossibly vast tree of possibilities it would use a DNN model for generation of less options. Like instead of generating thosands detailed moves like “move 5 meters”, “take that thing”, “put it there” and optimize over that, generate more abstract plan points specified by LLM with predictions of that step outcome and then evaluate how that outcome works for the goal. This way it could plan on more abstract level like humans to narrow down general plan or list of partial goals that lead to “final goal” or to “best path” (if it’s value function is more like integral over time instead of one final target). Find a good strategy. With enough time—it might be even a complex and indirect one. Then it could plan tactics for first step in the same way but on the lower abstraction. Then plan direct move step to realise first step of current tactics and run it. It might have several subprocesses that asynchronously work out strategy based on general state and goal, current tactics based on more detailed state and current strategical goal to pursue, current moves based on current tactical plan. With any numbers of abstraction and detail levels (2-3 seems like typical for humans, but AI might have more). This kind of agent might behave more like direct optimizer, even if using LLM and DNN inside for some parts. Direct optimization would have a first seat behind steering wheel in such agent.
I don’t think this will be outcome of research at OpenAI or other such laboratories any time soon. It might be, but if I would guess then I think it would be rather LLM or other DNN model “on top” that is connected to other models to “use at will”. For example it is rather easy to connect GPT-4 so it could use other models or APIs (like database, search). So this is very low hanging fruit for current AI development. I see that next step will be connecting it to more modalities and other models. It is currently going on.
I think though, this more direct agent might be the outcome of works done by military. Direct approach is much more reliable and reliability is one of the top key values for military-grade equipment. I only hope they will take the danger of such approach seriously.