[Epistemic note: I’m going past my expertise here, just giving my own current understanding, I encourage people with more expertise on this (possibly including you) to correct me]
Therefore: If we want an agent that can plan over an unbounded number of steps (e.g. one that does tree-search)
As a side note, I’m not sure how tree search comes into play; in what way does tree search require unbounded steps that doesn’t apply equally to linear search?
If we want an agent that can plan over an unbounded number of steps...it will need some component that can do an arbitrary number of iterative or recursive steps.
No finite agent, recursive or otherwise, can plan over an unbounded number of steps in finite time, so it’s not immediately clear to me how iteration/recursion is fundamentally different in practice. I think the better comparison is the power of a transformer under auto-regressive conditions (note that intermediate steps don’t need to be shown to a user as output). The first paper linked above shows that given polynomial intermediate steps, transformers can handle exactly the class of problems that can be solved in polynomial time, aka P. So under those conditions they’re pretty strong and general (which certainly doesn’t mean that LLMs trained on next-token prediction are that strong and general).
One useful way to think about it, for me, is that auto-regression is a form of recursion, albeit one that’s bounded in current architecture by the context length.
Thanks for the references; I’ll need some time to review them. In the meanwhile, I’ll make some quick responses.
As a side note, I’m not sure how tree search comes into play; in what way does tree search require unbounded steps that doesn’t apply equally to linear search?
I intended tree search as just one example, since minimax tree search is a common example for game-based RL research.
No finite agent, recursive or otherwise, can plan over an unbounded number of steps in finite time...
In general, I agree. Though there are notable exceptions for cases such as (not mutually exclusive):
a closed form solution is found (for example, where a time-based simulation can calculate some quantity at an any arbitrary time step using the same amount of computation)
approximate solutions using a fixed number of computation steps are viable
a greedy algorithm can select the immediate next action that is equivalent to following a longer-term planning algorithm
… so it’s not immediately clear to me how iteration/recursion is fundamentally different in practice.
Yes, like I said above, I agree in general and see your point.
As I’m confident we both know, some algorithms can be written more compactly when recursion/iteration are available. I don’t know how much computation theory touches on this; i.e. what classes of problems this applies to and why. I would make an intuitive guess that it is conceptually related to my point earlier about closed-form solutions.
[Epistemic note: I’m going past my expertise here, just giving my own current understanding, I encourage people with more expertise on this (possibly including you) to correct me]
Merrill and Sabharwal have done some very useful-seeming work on how strong transformers can be, both under auto-regressive conditions (‘The Expressive Power of Transformers with Chain of Thought’) and in a single forward pass with assumptions about precision (eg ‘The Parallelism Tradeoff: Limitations of Log-Precision Transformers’), although I haven’t gone through it in detail. Certainly it’s unambiguously true that there are limitations to what can be done in a single forward pass.
As a side note, I’m not sure how tree search comes into play; in what way does tree search require unbounded steps that doesn’t apply equally to linear search?
No finite agent, recursive or otherwise, can plan over an unbounded number of steps in finite time, so it’s not immediately clear to me how iteration/recursion is fundamentally different in practice. I think the better comparison is the power of a transformer under auto-regressive conditions (note that intermediate steps don’t need to be shown to a user as output). The first paper linked above shows that given polynomial intermediate steps, transformers can handle exactly the class of problems that can be solved in polynomial time, aka P. So under those conditions they’re pretty strong and general (which certainly doesn’t mean that LLMs trained on next-token prediction are that strong and general).
One useful way to think about it, for me, is that auto-regression is a form of recursion, albeit one that’s bounded in current architecture by the context length.
Thanks for the references; I’ll need some time to review them. In the meanwhile, I’ll make some quick responses.
I intended tree search as just one example, since minimax tree search is a common example for game-based RL research.
In general, I agree. Though there are notable exceptions for cases such as (not mutually exclusive):
a closed form solution is found (for example, where a time-based simulation can calculate some quantity at an any arbitrary time step using the same amount of computation)
approximate solutions using a fixed number of computation steps are viable
a greedy algorithm can select the immediate next action that is equivalent to following a longer-term planning algorithm
Yes, like I said above, I agree in general and see your point.
As I’m confident we both know, some algorithms can be written more compactly when recursion/iteration are available. I don’t know how much computation theory touches on this; i.e. what classes of problems this applies to and why. I would make an intuitive guess that it is conceptually related to my point earlier about closed-form solutions.
Totally, good point!