I’ll bite even further, and ask for the concept of “recurrence” itself to be dumbed down. What is “recurrence”, why is it important, and in what sense does e.g. a feedforward network hooked up to something like MCTS not qualify as relevantly “recurrent”?
(To me one important aspect is whether computation is fundamentally limited to a fixed number of steps vs. having a potentially unbounded loop.
The autoregressive version is an interesting compromise: it’s a fixed number of steps per token, but the answer can unfold in an unbounded fashion.
An interesting tid-bit here is that for traditional RNNs it is one loop iteration per an input token, but in autoregressive Transformers it is one loop iteration per an output token.)
I’ll bite even further, and ask for the concept of “recurrence” itself to be dumbed down. What is “recurrence”, why is it important, and in what sense does e.g. a feedforward network hooked up to something like MCTS not qualify as relevantly “recurrent”?
“Hooked up to something” might make a difference.
(To me one important aspect is whether computation is fundamentally limited to a fixed number of steps vs. having a potentially unbounded loop.
The autoregressive version is an interesting compromise: it’s a fixed number of steps per token, but the answer can unfold in an unbounded fashion.
An interesting tid-bit here is that for traditional RNNs it is one loop iteration per an input token, but in autoregressive Transformers it is one loop iteration per an output token.)