with the only recursive element of its thought being that it can pass 16 bits to its next running
I would name activations for all previous tokens as the relevant “element of thought” here that gets passed, and this can be gigabytes.
From how the quote looks, I think his gripe is with the possibility of in-context learning, where human-like learning happens without anything about how the network works (neither its weights nor previous token states) being ostensibly updated.
From how the quote looks, I think his gripe is with the possibility of in-context learning, where human-like learning happens without anything about how the network works (neither its weights nor previous token states) being ostensibly updated.
I don’t understand this. Something is being updated when humans or LLMs learn, no?
For every token, model activations are computed once when the token is encountered and then never explicitly revised → “only [seems like it] goes in one direction”
I would name activations for all previous tokens as the relevant “element of thought” here that gets passed, and this can be gigabytes.
From how the quote looks, I think his gripe is with the possibility of in-context learning, where human-like learning happens without anything about how the network works (neither its weights nor previous token states) being ostensibly updated.
I don’t understand this. Something is being updated when humans or LLMs learn, no?
For every token, model activations are computed once when the token is encountered and then never explicitly revised → “only [seems like it] goes in one direction”