Ok, let’s examine a more conservative scenario using solely visual input. If we take 10 megabits/s as the base and deduct 30% to account for sleep time, we’ll end up with roughly 0.78 petabytes accumulated over 30 years. This translates to approximately 157 trillion tokens in 30 years, or around 5.24 trillion tokens annually. Interestingly, even under these conservative conditions, the estimate significantly surpasses the training data of LLMs (~1 trillion tokens) by two orders of magnitude.
Ok, let’s examine a more conservative scenario using solely visual input. If we take 10 megabits/s as the base and deduct 30% to account for sleep time, we’ll end up with roughly 0.78 petabytes accumulated over 30 years. This translates to approximately 157 trillion tokens in 30 years, or around 5.24 trillion tokens annually. Interestingly, even under these conservative conditions, the estimate significantly surpasses the training data of LLMs (~1 trillion tokens) by two orders of magnitude.