Separately from my other comment, and more on the object level on your argument:
You focus on loops and say a feedforward network can’t be an “explicit optimizer”. Depending on what you mean by that term, I think you’re right.
I think it’s actually a pretty strong argument that a feedforward neural network itself can’t be much of an optimizer.
Transformers do some effective looping by doing multiple forward passes. They make the output of the last pass the input of the next pass. That’s a loop that’s incorporating their past computations into their new computations.
When run for long enough, as in long chains of thought, they do indeed very explicitly consider multiple courses of action.
So I think your intuition on a computational level is correct, but you’ve resisted seeing that it’s not terribly relevant to the world as it exists, in which transformers are run for many forward passes which effectively loop. You said in your comment, but not in the article that I noticed that
Yes, and relatedly LLMs are run in loops just to generate more than one token in general. This is different than running an explicit optimization algorithm within a single forward pass.
I think your intuition is correct and important, because looping is a good way to amplify effective intelligence and make it goal-directed or a good optimizer. That’s why I expect transformers to become really dangerous when they’re applied with more loops of metacognition. “Thinking” about themselves and having longer/better memories to loop back on their past conclusions is necessary for human intelligence. We haven’t yet implemented a lot of metacognition and memory for language model based agents. When we have added on those extra loops, I expect them to be more capable and more dangerous. I expect them to, by default, be metaoptimizers in almost exactly the same way people are.
Separately from my other comment, and more on the object level on your argument:
You focus on loops and say a feedforward network can’t be an “explicit optimizer”. Depending on what you mean by that term, I think you’re right.
I think it’s actually a pretty strong argument that a feedforward neural network itself can’t be much of an optimizer.
Transformers do some effective looping by doing multiple forward passes. They make the output of the last pass the input of the next pass. That’s a loop that’s incorporating their past computations into their new computations.
When run for long enough, as in long chains of thought, they do indeed very explicitly consider multiple courses of action.
So I think your intuition on a computational level is correct, but you’ve resisted seeing that it’s not terribly relevant to the world as it exists, in which transformers are run for many forward passes which effectively loop. You said in your comment, but not in the article that I noticed that
I think your intuition is correct and important, because looping is a good way to amplify effective intelligence and make it goal-directed or a good optimizer. That’s why I expect transformers to become really dangerous when they’re applied with more loops of metacognition. “Thinking” about themselves and having longer/better memories to loop back on their past conclusions is necessary for human intelligence. We haven’t yet implemented a lot of metacognition and memory for language model based agents. When we have added on those extra loops, I expect them to be more capable and more dangerous. I expect them to, by default, be metaoptimizers in almost exactly the same way people are.