[Question] If I ask an LLM to think step by step, how big are the steps?

I mean big in terms of number of tokens, and I am thinking about this question specifically in the context of training windows vs context windows. This question is inspired by an Andrew Mayne tweet covered in AI #80: Never Have I Ever:

Most AI systems are trained on less than 2,000 words per sample. They can generalize across all of its training but no system to date trains on 80,000 words per sample. This will change with more memory and processing power. When that happens AI will be able to actually “read” entire novels and understand story structure on a fundamental level.

Gwern says context windows solve this. Pretty sure I am behind the curve on this one, but concretely, if LLMs cannot write a good novel because they are constrained by a training window of 2000 tokens, would they also under step-by-step prompting tend to produce steps with that same constraint?

On the flip side of the coin, how does adding a context window resolve this, especially when drawing on information outside of what was provided in the context window?

No comments.