Peter Hroššo comments on chinchilla’s wild implications

Peter Hroššo 21 Aug 2022 5:17 UTC
2 points
0
Uncertainty about location within the text
I think the models are evaluated on inputs that fill their whole context window, ie. ~1024 tokens long. I doubt there is many parts in Shakespeare’s plays with the same 1024 tokens repeated.
- harsimony 21 Aug 2022 17:52 UTC
  1 point
  0
  Parent
  Oh I didn’t realize! Thanks for clarifying. Uncertainty about location probably doesn’t contribute much to the loss then.