There’s a lot of speculation for how these models operate. You specifically say “you don’t know” how it works, but are suggesting the idea it has some sort of planning phase.
As Wolfram explains, the Transformer architecture predicts one word at a time based on the previous inputs run through the model.
Any planning you think you see, is merely a trend based on common techniques for answering questions. The 5 sections of storytelling is an established technique that is commonly used in writing and thus embedded in the training of the model and seen in it’s responses.
In the future, these models could very well have planning phases—and more than next word prediction aligned with commons writing patterns.
There’s a lot of speculation for how these models operate. You specifically say “you don’t know” how it works, but are suggesting the idea it has some sort of planning phase.
As Wolfram explains, the Transformer architecture predicts one word at a time based on the previous inputs run through the model.
Any planning you think you see, is merely a trend based on common techniques for answering questions. The 5 sections of storytelling is an established technique that is commonly used in writing and thus embedded in the training of the model and seen in it’s responses.
In the future, these models could very well have planning phases—and more than next word prediction aligned with commons writing patterns.