On the other hand, the tens of billions of tokens fed to LLMs is orders of magnitude beyond what humans could ever experience.
Nitpick, but one of the LLaMA versions was trained on 1.4T tokens and GPT-4 probably had an even larger training dataset.
“Trillions of tokens” feels more accurate for SOTA language models.
Nitpick, but one of the LLaMA versions was trained on 1.4T tokens and GPT-4 probably had an even larger training dataset.
“Trillions of tokens” feels more accurate for SOTA language models.