But one thing that has completely surprised me is that these LLMs and other systems like them are all feed-forward. It’s like the firing of the neurons is going only in one direction. And I would never have thought that deep thinking could come out of a network that only goes in one direction, out of firing neurons in only one direction. And that doesn’t make sense to me, but that just shows that I’m naive.
What was the argument that being feed-forward limited the potential for deep thought in principle? It makes sense that multi-directional nets could do more with fewer neurons but Hofstader seemed to think there were things that feed-forward system fundamentally couldn’t do.
He explained a bunch of his position on this in Godel, Escher, Bach. If I remember correctly, it describes the limits of primitive recursive and general recursive functions this in chapter XIII. The basic idea (again, if I remember), is that a proof system can only reason about itself if its general recursive, and will always be able to reason about itself if its general recursive. Lots of what we see that makes humanity special compared to computers has to do with people having feelings and emotions and self-concepts, and reflection about past situations & thoughts. All things that really seem to require deep levels of recursion (this is a far shallower statement than what’s actually written in the book). Its strange to us then that ChatGPT can mimic those same outputs with the only recursive element of its thought being that it can pass 16 bits to its next running.
with the only recursive element of its thought being that it can pass 16 bits to its next running
I would name activations for all previous tokens as the relevant “element of thought” here that gets passed, and this can be gigabytes.
From how the quote looks, I think his gripe is with the possibility of in-context learning, where human-like learning happens without anything about how the network works (neither its weights nor previous token states) being ostensibly updated.
From how the quote looks, I think his gripe is with the possibility of in-context learning, where human-like learning happens without anything about how the network works (neither its weights nor previous token states) being ostensibly updated.
I don’t understand this. Something is being updated when humans or LLMs learn, no?
For every token, model activations are computed once when the token is encountered and then never explicitly revised → “only [seems like it] goes in one direction”
What was the argument that being feed-forward limited the potential for deep thought in principle? It makes sense that multi-directional nets could do more with fewer neurons but Hofstader seemed to think there were things that feed-forward system fundamentally couldn’t do.
He explained a bunch of his position on this in Godel, Escher, Bach. If I remember correctly, it describes the limits of primitive recursive and general recursive functions this in chapter XIII. The basic idea (again, if I remember), is that a proof system can only reason about itself if its general recursive, and will always be able to reason about itself if its general recursive. Lots of what we see that makes humanity special compared to computers has to do with people having feelings and emotions and self-concepts, and reflection about past situations & thoughts. All things that really seem to require deep levels of recursion (this is a far shallower statement than what’s actually written in the book). Its strange to us then that ChatGPT can mimic those same outputs with the only recursive element of its thought being that it can pass 16 bits to its next running.
I would name activations for all previous tokens as the relevant “element of thought” here that gets passed, and this can be gigabytes.
From how the quote looks, I think his gripe is with the possibility of in-context learning, where human-like learning happens without anything about how the network works (neither its weights nor previous token states) being ostensibly updated.
I don’t understand this. Something is being updated when humans or LLMs learn, no?
For every token, model activations are computed once when the token is encountered and then never explicitly revised → “only [seems like it] goes in one direction”