There’s a lot of speculation for how these models operate. You specifically say “you don’t know” how it works, but are suggesting the idea it has some sort of planning phase.
As Wolfram explains, the Transformer architecture predicts one word at a time based on the previous inputs run through the model.
Any planning you think you see, is merely a trend based on common techniques for answering questions. The 5 sections of storytelling is an established technique that is commonly used in writing and thus embedded in the training of the model and seen in it’s responses.
In the future, these models could very well have planning phases—and more than next word prediction aligned with commons writing patterns.
There’s a lot of speculation for how these models operate. You specifically say “you don’t know” how it works, but are suggesting the idea it has some sort of planning phase.
As Wolfram explains, the Transformer architecture predicts one word at a time based on the previous inputs run through the model.
Any planning you think you see, is merely a trend based on common techniques for answering questions. The 5 sections of storytelling is an established technique that is commonly used in writing and thus embedded in the training of the model and seen in it’s responses.
In the future, these models could very well have planning phases—and more than next word prediction aligned with commons writing patterns.
If you look at the other comments I’ve made today you’ll see that I’ve revised my view somewhat.
As for real planning, that’s certainly what Yann Lecun talked about in the white paper he uploaded last summer.