DragonGod comments on The surprising parameter efficiency of vision models

DragonGod 8 Apr 2023 21:12 UTC
3 points
1

On the other hand, the tens of billions of tokens fed to LLMs is orders of magnitude beyond what humans could ever experience.

Nitpick, but one of the LLaMA versions was trained on 1.4T tokens and GPT-4 probably had an even larger training dataset.

“Trillions of tokens” feels more accurate for SOTA language models.