I don’t think the human concept of ‘plan’ is even a sensible concept to apply here. What it has is in many ways very much like a human plan, and in many other ways utterly unlike a human plan.
One way in which you could view them as similar is that just as there is a probability distribution over single token output (which may be trivial for zero temperature), there is a corresponding probability distribution over all sequences of tokens. You could think of this distribution as a plan with decisions yet to be made. For example, there may be some small possibility of continuing to “Once upon a horse, you may be concerned about falling off”, but by emitting ” time” it ‘decides’ not to pursue such options and mostly focuses on writing a fairy tale instead.
However, this future structure is not explicitly modelled anywhere, as far as I know. It’s possible that some model might have a “writing a fairy tale” neuron in there somewhere, linked to others that represent describable aspects of the story so far and others yet to come, and which increases the weighting of the token ” time” after “Once upon a”. I doubt there’s anything so directly interpretable as that, but I think it’s pretty certain that there are some structures in activations representing clusters of continuations past the current generation token.
Should we call those structures “plans” or not?
If so, are these plans recreated from scratch? Well in the low-level implementation sense yes, since these types of LLM are stateless. However we’re quite familiar with other systems that implement persistent state transitions via stateless underlying protocols, and the generated text can serve as a ‘cookie’ across thousands of tokens. The distinction between creation of plans from scratch and persistence of plans between generations isn’t so clear in this case.
However, this future structure is not explicitly modelled anywhere, as far as I know. It’s possible that some model might have a “writing a fairy tale” neuron in there somewhere, linked to others that represent describable aspects of the story so far and others yet to come, and which increases the weighting of the token ” time” after “Once upon a”. I doubt there’s anything so directly interpretable as that, but I think it’s pretty certain that there are some structures in activations representing clusters of continuations past the current generation token.
More like a fairy tale region than a neuron. And once the system enters that region it stays there until the story is done.
Should we call those structures “plans” or not?
In the context of this discussion, I can live with that.
I don’t think the human concept of ‘plan’ is even a sensible concept to apply here. What it has is in many ways very much like a human plan, and in many other ways utterly unlike a human plan.
One way in which you could view them as similar is that just as there is a probability distribution over single token output (which may be trivial for zero temperature), there is a corresponding probability distribution over all sequences of tokens. You could think of this distribution as a plan with decisions yet to be made. For example, there may be some small possibility of continuing to “Once upon a horse, you may be concerned about falling off”, but by emitting ” time” it ‘decides’ not to pursue such options and mostly focuses on writing a fairy tale instead.
However, this future structure is not explicitly modelled anywhere, as far as I know. It’s possible that some model might have a “writing a fairy tale” neuron in there somewhere, linked to others that represent describable aspects of the story so far and others yet to come, and which increases the weighting of the token ” time” after “Once upon a”. I doubt there’s anything so directly interpretable as that, but I think it’s pretty certain that there are some structures in activations representing clusters of continuations past the current generation token.
Should we call those structures “plans” or not?
If so, are these plans recreated from scratch? Well in the low-level implementation sense yes, since these types of LLM are stateless. However we’re quite familiar with other systems that implement persistent state transitions via stateless underlying protocols, and the generated text can serve as a ‘cookie’ across thousands of tokens. The distinction between creation of plans from scratch and persistence of plans between generations isn’t so clear in this case.
However, this future structure is not explicitly modelled anywhere, as far as I know. It’s possible that some model might have a “writing a fairy tale” neuron in there somewhere, linked to others that represent describable aspects of the story so far and others yet to come, and which increases the weighting of the token ” time” after “Once upon a”. I doubt there’s anything so directly interpretable as that, but I think it’s pretty certain that there are some structures in activations representing clusters of continuations past the current generation token.
More like a fairy tale region than a neuron. And once the system enters that region it stays there until the story is done.
Should we call those structures “plans” or not?
In the context of this discussion, I can live with that.