Jose Miguel Cruz y Celis comments on chinchilla’s wild implications

Jose Miguel Cruz y Celis 23 May 2023 5:07 UTC
4 points
0
I did some calculations with a bunch of assumptions and simplifications but here’s a high estimate, back of the envelope calculation for the data and “tokens” a 30 year old human would have “trained” on:
- Visual data: 130 million photoreceptor cells, firing at 10 Hz = 1.3Gbits/s = 162.5 MB/s over 30 years (aprox. 946,080,000 seconds) = 153 Petabytes
- Auditory data: Humans can hear frequencies up to 20,000 Hz, high quality audio is sampled at 44.1 kHz satisfying Nyquist-Shannon sampling theorem, if we assume a 16bit (cd quality)*2(channels for stereo) = 1.41 Mbits/s = .18 MB/s over 30 years = .167 Petabytes
- Tactile data: 4 million touch receptors providing 8 bits/s (assuming they account for temperature, pressure, pain, hair movement, vibration) = 5 MB/s over 30 years = 4.73 Petabytes
- Olfactory data: We can detect up to 1 trillion smells , assuming we process 1 smell every second and each smell is represented a its own piece of data i.e. log2(1trillion) = 40 bits/s = 0.0000050 MB/s over 30 years = .000004 Petabytes
- Taste data: 10,000 receptors, assuming a unique identifier for each basic taste (sweet, sour, salty, bitter and umami) log2(5) 2.3 bits rounded up to 3 = 30 kbits/s = 0.00375 MB/s over 30 years = .00035 Petabytes
  
  This amounts to 153 + .167 + 4.73 + .000004 + .00035 = 158.64 Petabytes assuming 5 bytes per token (i.e. 5 characters) this amounts to 31,728 T tokens
  
  This is of course a high estimate and most of this data will clearly have huge compression capacity, but I wanted to get a rough estimate of a high upper bound.
  
  Here’s the google sheet if anyone wants to copy it or contribute