eggsyntax comments on eggsyntax’s Shortform

eggsyntax 23 Mar 2024 1:21 UTC
1 point
0
is there anything distinguishable at all?

Not that I see! I would expect it to be fully indistinguishable until incompatible sensory input eventually reaches the brain (if it doesn’t wink out first). So far it seems to me like our intuitions around that are the same.
What makes it significant?
I think at least in terms of my own intuitions, it’s that there’s an unambiguous start and stop to each tick of the perceive-and-think-and-act cycle. I don’t think that’s true for human processing, although I’m certainly open to my mental model being wrong.
Going back to your original reply, you said ‘I think it’s really tricky to think that there are fundamental differences based on duration or speed of experience’, and that’s definitely not what I’m trying to point to. I think you’re calling out some fuzziness in the distinction between started/stopped human cognition and started/stopped LLM cognition, and I recognize that’s there. I do think that if you could perfectly freeze & restart human cognition, that would be more similar, so maybe it’s a difference in practice more than a difference in principle.
But it does still seem to me that the fully discrete start-to-stop cycle (including the environment only changing in discrete ticks which are coordinated with that cycle) is part of what makes LLMs more Boltzmann-brainy to me. Paired with the lack of internal memory, it means that you could give an LLM one context for this forward pass, and a totally different context for the next forward pass, and that wouldn’t be noticeable to the LLM, whereas it very much would be for humans (caveat: I’m unsure what happens to the residual stream between forward passes, whether it’s reset for each pass or carried through to the next pass; if the latter, I think that might mean that switching context would be in some sense noticeable to the LLM [EDIT—it’s fully reset for each pass (in typical current architectures) other than kv caching which shouldn’t matter for behavior or (hypothetical) subjective experience).
This seems analogous with large model processing, where the activations and calculations happen over time, each with multiple processor cycles and different timeslices.
Can you explain that a bit? I think of current-LLM forward passes as necessarily having to happen sequentially (during normal autoregressive operation), since the current forward pass’s output becomes part of the next forward pass’s input. Am I oversimplifying?