Uncertainty about location within the text
I think the models are evaluated on inputs that fill their whole context window, ie. ~1024 tokens long. I doubt there is many parts in Shakespeare’s plays with the same 1024 tokens repeated.
Oh I didn’t realize! Thanks for clarifying. Uncertainty about location probably doesn’t contribute much to the loss then.
I think the models are evaluated on inputs that fill their whole context window, ie. ~1024 tokens long. I doubt there is many parts in Shakespeare’s plays with the same 1024 tokens repeated.
Oh I didn’t realize! Thanks for clarifying. Uncertainty about location probably doesn’t contribute much to the loss then.