Mostly due to the limited working memory that Transformers typically use (e.g., a buffer of only the most recent 512 tokens feeding into the decoder). When humans write novels, they have to keep track of plot points, character sheets, thematic arcs, etc. across tens of thousands of words. You could probably get it to work, though, if you augmented the LLM with content-addressable memory and included positional encoding that is aware of where in the novel (percentage-wise) each token resides.
Interesting—why is that?
Mostly due to the limited working memory that Transformers typically use (e.g., a buffer of only the most recent 512 tokens feeding into the decoder). When humans write novels, they have to keep track of plot points, character sheets, thematic arcs, etc. across tens of thousands of words. You could probably get it to work, though, if you augmented the LLM with content-addressable memory and included positional encoding that is aware of where in the novel (percentage-wise) each token resides.