Noise: the world is noisy and infinitely detailed. The training data for all but the simplest toy models have some amount of noise in inputs and labels. Your picture of a cat will not be a platonically perfect cat: it will have imperfections due to pixellation, due to atmospheric phenomena and camera artefacts interacting with the integrity of the image; the cat’s fur will be affected by accidents of dirt and discoloration. Labels may be garbled or imprecise. Etc. Similarly, text (though it is usually thought of as discrete, and thus seemingly less susceptible to noise than pictures) suffers from external noise: the writer may be affected by distractions in the environment, by texts read recently, and so on. While it’s possible to capture some amount of this (e.g. mood) in a predictive speech generation process, there will always be some amount of sufficiently fine-grained random context (that mosquito bite behind your left shoulder that makes you remember a hiking trip with your grandpa and causes your writing to be more wistful) that ultimately must be abstracted out as noise by state-of-the-art ML systems.
The big reason for this is quantum physics, at a high level, because the uncertainty principles don’t allow you to remove all noise from a system, or even arbitrarily much noise, meaning that there can only be finite accuracy to labels and inputs from basically any source:
The big reason for this is quantum physics, at a high level, because the uncertainty principles don’t allow you to remove all noise from a system, or even arbitrarily much noise, meaning that there can only be finite accuracy to labels and inputs from basically any source:
https://en.wikipedia.org/wiki/Quantum_noise