At each new generated token it still assumes that the past 999 tokens were written by humans
By now means this is necessary. During fine-tuning for dialogue and question-answering, GPT is clearly selected for discriminating the boundaries of the user-generated, and, equivalently, self-generated text in its context (and probably these boundaries are marked with special control tokens).
If we were token about GPTs trained in pure SSL mode without fine-tuning whatsoever, that would be a different story, but this is not practically the case.
By now means this is necessary. During fine-tuning for dialogue and question-answering, GPT is clearly selected for discriminating the boundaries of the user-generated, and, equivalently, self-generated text in its context (and probably these boundaries are marked with special control tokens).
If we were token about GPTs trained in pure SSL mode without fine-tuning whatsoever, that would be a different story, but this is not practically the case.