In my social circles, I frequently tell a joke making fun of the awful lot of cute kitten pictures available on the internet (“somewhere in the world, a whole server farm is doing nothing but storing pictures of cute kittens”). Joking apart, there are thousands of data centers around and the world’s total data storage capacity is measured in Zettabytes.
How much of this memory do you think to be actually occupied by cute kitten pictures? What could be an effective way to make a Fermi estimate?
Look at ImageNet (https://image-net.org/index.php) tags and find the percent of them that are kitten pictures. The International Data Corperation estimates there are around 6.8 zettabytes of storage globally (https://www.idc.com/getdoc.jsp?containerId=prUS46303920). Now we just need the fraction of total storage dedicated to consumer images. Maybe 2%?
I’d guess something like (0.1% kitten pictures) x (2% consumer images) x (6.8 zettabytes) = 21,500 terabytes of kitten images.
Oh no! You missed out on stating this as 21.5 PETabytes!
I arrived at a similar order of magnitude via a different path. I suspect that almost all cute kitten pictures are just stored in a few (1-5?) locations, and assuming a power law such that maybe 1% of the world population kept 100-ish kitten pictures (almost all of which are by default cute), and maybe 5% more keep an average of 10-ish pictures. I’d estimate each picture to be on the order of a megabyte.
That yields around 36 Petabytes.
ImageNet was constructed to match the WordNet hierarchy, and is not representative of the distribution of images stored online. I’d guess that cat pics are10x--10Kx overrepresented.
I’d also be shocked if consumer images are even 0.1% of all data stored; there’s a huge volume of other heavier datasets out there.