But if I am right and ChatGPT isn’t choosing a number before it says “Ready,” why do you think that ChatGPT “has a plan?” Is the story situation crucially different in some way?
I think there is one difference: in the “write a story” case, the model subsequently generates the text without further variable input.
If the story is written in pieces with further variable prompting, I would agree that there is little sense in which it ‘has a plan’. To what extent that it could be said to have a plan, that plan is radically altered in response to every prompt.
I think this sort of thing is highly likely for any model of this type with no private state, though not essential. It could have a conditional distribution of future stories that is highly variable in response to instructions about what the story should contain and yet completely insensitive to mere questions about it, but I think that’s a very unlikely type of model. Systems with private state are much more likely to be trainable to query that state and answer questions about it without changing much of the state. Doing the same with merely an enormously high dimensional implicit distribution seems too much of a balancing act for any training regimen to target.
Suppose we modify the thought experiment so that we ask the LLM to simplify both sides of the “pick a number between 1 and 100” / “ask yes/no questions about the number.” Now there is no new variable input from the user, but the yes/no questions still depend on random sampling. Would you now say that the LLM has chosen a number immediately after it prints out “Ready?”
Chosen a number: no (though it does at temperature zero).
Has something approximating a plan for how the ‘conversation’ will go (including which questions are most favoured at each step and go with which numbers), yes to some extent. I do think “plan” is a misleading word, though I don’t have anything better.
I think the realization I’m coming to is that folks on this thread have a shared understanding of the basic mechanics (we seem to be agreed on what computations are occurring, we don’t seem to be making any different predictions), and we are unsure about interpretation. Do you agree?
For myself, I continue to maintain that viewing the system as a next-word sampler is not misleading, and that saying it has a “plan” is misleading—but I try to err very on the side of not anthropomorphizing / not taking an intentional stance (I also try to avoid saying the system “knows” or “understands” anything). I do agree that the system’s activation cache contain a lot of information that collectively biases the next word predictor towards producing the output it produces; I see how someone might reasonably call that a “plan” although I choose not to.
FWIW, I’m not wedded to “plan.” And as for anthropomorphizing, there are many times when anthropomorphic phrasing is easier and more straightforward, so I don’t want to waste time trying to work around it with more complex phrasing. The fact is these devices are fundamentally new and we need to come up with new ways of talking about them. That’s going to take awhile.
Then wouldn’t you believe that in the case of my thought experiment, the number is also smeared through the parameter weights? Or maybe it’s merely the intent to pick a number later that’s smeared through the parameter weights?
Lots of things are smeared through the number weights.
I’ve prompted ChatGPT with “tell me a story” well over a dozen times, independently in separate sessions. On three occasions I’ve gotten a story with elements from “Jack and the beanstalk.” There’s the name, the beanstalk, and the giant. But the giant wasn’t blind and no “fee fi fo fum.” Why that story three times? I figure it’s more or less an arbitrary fact of history and that seems to be particularly salient for ChatGPT.
But if I am right and ChatGPT isn’t choosing a number before it says “Ready,” why do you think that ChatGPT “has a plan?” Is the story situation crucially different in some way?
I think there is one difference: in the “write a story” case, the model subsequently generates the text without further variable input.
If the story is written in pieces with further variable prompting, I would agree that there is little sense in which it ‘has a plan’. To what extent that it could be said to have a plan, that plan is radically altered in response to every prompt.
I think this sort of thing is highly likely for any model of this type with no private state, though not essential. It could have a conditional distribution of future stories that is highly variable in response to instructions about what the story should contain and yet completely insensitive to mere questions about it, but I think that’s a very unlikely type of model. Systems with private state are much more likely to be trainable to query that state and answer questions about it without changing much of the state. Doing the same with merely an enormously high dimensional implicit distribution seems too much of a balancing act for any training regimen to target.
Suppose we modify the thought experiment so that we ask the LLM to simplify both sides of the “pick a number between 1 and 100” / “ask yes/no questions about the number.” Now there is no new variable input from the user, but the yes/no questions still depend on random sampling. Would you now say that the LLM has chosen a number immediately after it prints out “Ready?”
Chosen a number: no (though it does at temperature zero).
Has something approximating a plan for how the ‘conversation’ will go (including which questions are most favoured at each step and go with which numbers), yes to some extent. I do think “plan” is a misleading word, though I don’t have anything better.
Thank you, this is helpful.
I think the realization I’m coming to is that folks on this thread have a shared understanding of the basic mechanics (we seem to be agreed on what computations are occurring, we don’t seem to be making any different predictions), and we are unsure about interpretation. Do you agree?
For myself, I continue to maintain that viewing the system as a next-word sampler is not misleading, and that saying it has a “plan” is misleading—but I try to err very on the side of not anthropomorphizing / not taking an intentional stance (I also try to avoid saying the system “knows” or “understands” anything). I do agree that the system’s activation cache contain a lot of information that collectively biases the next word predictor towards producing the output it produces; I see how someone might reasonably call that a “plan” although I choose not to.
FWIW, I’m not wedded to “plan.” And as for anthropomorphizing, there are many times when anthropomorphic phrasing is easier and more straightforward, so I don’t want to waste time trying to work around it with more complex phrasing. The fact is these devices are fundamentally new and we need to come up with new ways of talking about them. That’s going to take awhile.
Read the comments I’ve posted earlier today. The plan is smeared through the parameter weights.
Then wouldn’t you believe that in the case of my thought experiment, the number is also smeared through the parameter weights? Or maybe it’s merely the intent to pick a number later that’s smeared through the parameter weights?
Lots of things are smeared through the number weights.
I’ve prompted ChatGPT with “tell me a story” well over a dozen times, independently in separate sessions. On three occasions I’ve gotten a story with elements from “Jack and the beanstalk.” There’s the name, the beanstalk, and the giant. But the giant wasn’t blind and no “fee fi fo fum.” Why that story three times? I figure it’s more or less an arbitrary fact of history and that seems to be particularly salient for ChatGPT.