The data rate of optical information through human optic nerves to the brain have variously been estimated at about 1-10 megabits per second, which is two or three orders of magnitude smaller than the estimate here. Likewise the bottleneck on tactile sensory information is in the tactile nerves, not the receptors. I don’t know about the taste receptors, but I very much doubt that distinct information from every receptor goes into the brain.
While the volume of training data is still likely larger than for current LLMs, I don’t think the ratio is anywhere near so large as the conclusion states. A quadrillion “tokens” per year is an extremely loose upper bound, not a lower bound.
Ok, let’s examine a more conservative scenario using solely visual input. If we take 10 megabits/s as the base and deduct 30% to account for sleep time, we’ll end up with roughly 0.78 petabytes accumulated over 30 years. This translates to approximately 157 trillion tokens in 30 years, or around 5.24 trillion tokens annually. Interestingly, even under these conservative conditions, the estimate significantly surpasses the training data of LLMs (~1 trillion tokens) by two orders of magnitude.
The data rate of optical information through human optic nerves to the brain have variously been estimated at about 1-10 megabits per second, which is two or three orders of magnitude smaller than the estimate here. Likewise the bottleneck on tactile sensory information is in the tactile nerves, not the receptors. I don’t know about the taste receptors, but I very much doubt that distinct information from every receptor goes into the brain.
While the volume of training data is still likely larger than for current LLMs, I don’t think the ratio is anywhere near so large as the conclusion states. A quadrillion “tokens” per year is an extremely loose upper bound, not a lower bound.
Ok, let’s examine a more conservative scenario using solely visual input. If we take 10 megabits/s as the base and deduct 30% to account for sleep time, we’ll end up with roughly 0.78 petabytes accumulated over 30 years. This translates to approximately 157 trillion tokens in 30 years, or around 5.24 trillion tokens annually. Interestingly, even under these conservative conditions, the estimate significantly surpasses the training data of LLMs (~1 trillion tokens) by two orders of magnitude.