This whole conversation has been very helpful. Thanks for your time and interest.
Some further thoughts:
First, as I’ve suggested in the OP, I am using the term “story trajectory” to refer the complete set of token-to-token transitions ChatGPT makes in the course of telling story. The trajectories for these stories have five segments. Given this, it seems clear to me that these stories are organized on three levels: 1) individual sentences, 2) sentences within a segment of the story trajectory, and 3) the whole story trajectory.
That gives us three kinds of transition from one token to the next: 1) from one word to the next word within a sentence, 2) from the last word of a sentence (ChatGPT treats end-punctuation as a word, at least that’s what it told me when I was asking it to count the number of words in a sentence) to first word of the next sentence, and 3) from the last word in one trajectory segment to the first word in the next segment. We also have the beginning transition and the concluding transition. The beginning transition moves from the final token of the prompt to the first token in the story. The concluding transition moves from the last token of the story, to what is I assume a wait state. I note that on a few occasions ChatGPT has concluded with “The end.” on a single line, but that is relatively rare.
That gives us fives kinds of token-to-token transition, three kinds within the story, and then a pair that bracket the story. Something different happens in each case. But in all cases, except the story end, we’re dealing with next token prediction. What accounts for the differences between those kinds of next-token prediction? It seems to me that the context changes and that changes the relevant probability distribution. Those context changes are the “plan.”
This whole conversation has been very helpful. Thanks for your time and interest.
Some further thoughts:
First, as I’ve suggested in the OP, I am using the term “story trajectory” to refer the complete set of token-to-token transitions ChatGPT makes in the course of telling story. The trajectories for these stories have five segments. Given this, it seems clear to me that these stories are organized on three levels: 1) individual sentences, 2) sentences within a segment of the story trajectory, and 3) the whole story trajectory.
That gives us three kinds of transition from one token to the next: 1) from one word to the next word within a sentence, 2) from the last word of a sentence (ChatGPT treats end-punctuation as a word, at least that’s what it told me when I was asking it to count the number of words in a sentence) to first word of the next sentence, and 3) from the last word in one trajectory segment to the first word in the next segment. We also have the beginning transition and the concluding transition. The beginning transition moves from the final token of the prompt to the first token in the story. The concluding transition moves from the last token of the story, to what is I assume a wait state. I note that on a few occasions ChatGPT has concluded with “The end.” on a single line, but that is relatively rare.
That gives us fives kinds of token-to-token transition, three kinds within the story, and then a pair that bracket the story. Something different happens in each case. But in all cases, except the story end, we’re dealing with next token prediction. What accounts for the differences between those kinds of next-token prediction? It seems to me that the context changes and that changes the relevant probability distribution. Those context changes are the “plan.”