But it is an inherently odd proposition that you can get a better picture of the environment by adding noise to your sensory information—by deliberately throwing away your sensory acuity. This can only degrade the mutual information between yourself and the environment. It can only diminish what in principle can be extracted from the data.
It is certainly counterintuitive to think that, by adding noise, you can get more out of data. But it is nevertheless true.
Every detection system has a perceptual threshold, a level of stimulation needed for it to register a signal. If the system is mostly noise-free, this threshold is a ’sharp’ transition. If the system has a lot of noise, the theshold is ‘fuzzy’. The noise present at one moment might destructively interact with the signal, reducing its strength, or constructively interact, making it stronger. The result is that the threshold becomes an average; it is no longer possible to know whether the system will respond merely by considering the strength of the signal.
When dealing with a signal that is just below the threshold, a noiseless system won’t be able to perceive it at all. But a noisy system will pick out some of it—some of the time, the noise and the weak signal will add together in such a way that the result is strong enough for the system to react to it positively.
You can see this effect demonstrated at science museums. If an image is printed very, very faintly on white paper, just at the human threshold for visual detection, you can stare right at the paper and not see what’s there. But if the same image is printed onto paper on which a random pattern of grey dots has also been printed, we can suddenly perceive some of it—and extrapolate the whole from the random parts we can see. We are very good at extracting data from noisy systems, but only if we can perceive the data in the first place. The noise makes it possible to detect the data carried by weak signals.
When trying to make out faint signals, static can be beneficial. Which is why biological organisms introduce noise into their detection physiologies—a fact which surprised biologists when they first learned of it.
The pattern painted onto white paper can’t be seen because the image is also white. If the white image is printed onto paper that has parts of it that aren’t white of course it’s going to be more visible. Adding noise would be the equivalent of taking the image already printed onto white paper, and just adding random static on top of it. It would be even harder to see still.
What you’re saying just makes no sense to me. Adding noise is just as likely to increase the existing signal as it is to decrease it. Or to make a signal appear that isn’t there at all. I can’t see how it’s doing anything to help detect the signal.
What you’re missing is that, if the signal is below the detection threshold, there is no loss if the noise pushes it farther below the detection threshold, whereas there is a gain when the noise pushes the signal above the detection threshold. Thus the noise increases sensitivity, at the cost of accuracy. (And since a lot of sensory information is redundant, the loss of accuracy is easy to work around.)
In which case, you could view the image even better if you just changed the whole backdrop to gray, instead of just random parts of it. This would correspond to the “using the same knowledge to produce a superior algorithm” part of the article.
As I understood it, the article specifically did not state that you can’t ever improve a deterministic algorithm by adding randomness—only that this is a sign that you algorithm is crap, not that the problem fundamentally requires randomness. There should always exist a different deterministic algorithm which is more accurate than your random algorithm (at least in theory—in practice, that algorithm might have an unacceptable runtime or it would require even more knowledge than you have)
This post is my first experience learning about noise in algorithms, so forgive me if I seem underinformed. Two points occurred to me while reading this comment, some clarification would be great:
First, while it was intriguing to read that input just below the perceptual threshold would half the time be perceived by bumping it above the threshold, it seems to me that input just above the threshold would half the time be knocked below it. So wouldn’t noise lead to no gain? Just a loss in acuity?
Second, I’m confused how input below the perceptual threshold is actually input. If a chair moves in front of a camera so slightly that the camera doesn’t register a change in position, the input seems to me like zero, and noise loud enough to move zero past the perceptual threshold would not distinguish between movement and stillness, but go off half the time and half the time be silent. If that doesn’t make sense, assume that the threshold is .1 meters, and the camera doesn’t notice any movement less than that. Let’s say your noise is a random number between .01 meters and -.01 meters. The chair moves .09 meters, and your noise lands on .01 meters. I wouldn’t think that would cross the threshold, because the camera can’t actually detect that .09 meters if it’s threshold is .1. So, wouldn’t the input just be 0 motion detected + .01 meters of noise = .01 meters of motion? Maybe I’m misunderstanding.
Suppose you have a motion-detector that looks once per second and notices a change when the chair moves by 0.1m within a second and is completely blind to smaller changes. Then a chair moving at 0.09m/s won’t trigger it at all. Now suppose you add noise of amplitude +-0.01m. Then in most seconds you still won’t see anything, but sometimes (I think 1⁄8 of the time, if that noise is uniformly distributed) the apparent movement will be above the threshold. So now if you do some kind of aggregation of the detector output over time you’ll be able to tell that the chair is moving.
Yes, the cost of this is that above the threshold your performance is worse. You’ll need to take averages or something of the kind to make up for it. (But: when a detector has a threshold, it usually doesn’t give perfectly accurate measurements just above the threshold. You may find that even above the threshold you actually get more useful results in the presence of noise.)
Another example. Suppose you are trying to detect oscillating signals (musical notes, radio waves, …) via an analogue-to-digital converter. Let’s say its resolution is 1 unit. Then a signal oscillating between −0.5 and +0.5 will not show up at all: every time you sample it you’ll get zero. And any small change to the signal will make exactly no difference to the output. But if you add enough noise to that signal, it becomes detectable. You’ll need to average your data (or do something broadly similar); you’ll have some risk of false positives; but if you have enough data you can measure the signal pretty well even though it’s well below the threshold of your ADC.
[EDITED to add:] It may be worth observing that there’s nothing super-special about adding random stuff for this purpose. E.g., suppose you’re trying to measure some non-varying value using an analogue-to-digital converter, and the value you’re trying to measure is smaller than the resolution in your ADC. You could (as discussed above) add noise and average. But if you happen to have the ability to add non-random offsets to your data before measuring, you can do that and get better results than with random offsets.
In other words, this is not an exception to the principle Eliezer proposes, that anything you can improve by adding randomness you can improve at least as much by adding something not-so-random instead.
It is certainly counterintuitive to think that, by adding noise, you can get more out of data. But it is nevertheless true.
Every detection system has a perceptual threshold, a level of stimulation needed for it to register a signal. If the system is mostly noise-free, this threshold is a ’sharp’ transition. If the system has a lot of noise, the theshold is ‘fuzzy’. The noise present at one moment might destructively interact with the signal, reducing its strength, or constructively interact, making it stronger. The result is that the threshold becomes an average; it is no longer possible to know whether the system will respond merely by considering the strength of the signal.
When dealing with a signal that is just below the threshold, a noiseless system won’t be able to perceive it at all. But a noisy system will pick out some of it—some of the time, the noise and the weak signal will add together in such a way that the result is strong enough for the system to react to it positively.
You can see this effect demonstrated at science museums. If an image is printed very, very faintly on white paper, just at the human threshold for visual detection, you can stare right at the paper and not see what’s there. But if the same image is printed onto paper on which a random pattern of grey dots has also been printed, we can suddenly perceive some of it—and extrapolate the whole from the random parts we can see. We are very good at extracting data from noisy systems, but only if we can perceive the data in the first place. The noise makes it possible to detect the data carried by weak signals.
When trying to make out faint signals, static can be beneficial. Which is why biological organisms introduce noise into their detection physiologies—a fact which surprised biologists when they first learned of it.
The pattern painted onto white paper can’t be seen because the image is also white. If the white image is printed onto paper that has parts of it that aren’t white of course it’s going to be more visible. Adding noise would be the equivalent of taking the image already printed onto white paper, and just adding random static on top of it. It would be even harder to see still.
What you’re saying just makes no sense to me. Adding noise is just as likely to increase the existing signal as it is to decrease it. Or to make a signal appear that isn’t there at all. I can’t see how it’s doing anything to help detect the signal.
What you’re missing is that, if the signal is below the detection threshold, there is no loss if the noise pushes it farther below the detection threshold, whereas there is a gain when the noise pushes the signal above the detection threshold. Thus the noise increases sensitivity, at the cost of accuracy. (And since a lot of sensory information is redundant, the loss of accuracy is easy to work around.)
In which case, you could view the image even better if you just changed the whole backdrop to gray, instead of just random parts of it. This would correspond to the “using the same knowledge to produce a superior algorithm” part of the article.
As I understood it, the article specifically did not state that you can’t ever improve a deterministic algorithm by adding randomness—only that this is a sign that you algorithm is crap, not that the problem fundamentally requires randomness. There should always exist a different deterministic algorithm which is more accurate than your random algorithm (at least in theory—in practice, that algorithm might have an unacceptable runtime or it would require even more knowledge than you have)
This post is my first experience learning about noise in algorithms, so forgive me if I seem underinformed. Two points occurred to me while reading this comment, some clarification would be great:
First, while it was intriguing to read that input just below the perceptual threshold would half the time be perceived by bumping it above the threshold, it seems to me that input just above the threshold would half the time be knocked below it. So wouldn’t noise lead to no gain? Just a loss in acuity?
Second, I’m confused how input below the perceptual threshold is actually input. If a chair moves in front of a camera so slightly that the camera doesn’t register a change in position, the input seems to me like zero, and noise loud enough to move zero past the perceptual threshold would not distinguish between movement and stillness, but go off half the time and half the time be silent. If that doesn’t make sense, assume that the threshold is .1 meters, and the camera doesn’t notice any movement less than that. Let’s say your noise is a random number between .01 meters and -.01 meters. The chair moves .09 meters, and your noise lands on .01 meters. I wouldn’t think that would cross the threshold, because the camera can’t actually detect that .09 meters if it’s threshold is .1. So, wouldn’t the input just be 0 motion detected + .01 meters of noise = .01 meters of motion? Maybe I’m misunderstanding.
Suppose you have a motion-detector that looks once per second and notices a change when the chair moves by 0.1m within a second and is completely blind to smaller changes. Then a chair moving at 0.09m/s won’t trigger it at all. Now suppose you add noise of amplitude +-0.01m. Then in most seconds you still won’t see anything, but sometimes (I think 1⁄8 of the time, if that noise is uniformly distributed) the apparent movement will be above the threshold. So now if you do some kind of aggregation of the detector output over time you’ll be able to tell that the chair is moving.
Yes, the cost of this is that above the threshold your performance is worse. You’ll need to take averages or something of the kind to make up for it. (But: when a detector has a threshold, it usually doesn’t give perfectly accurate measurements just above the threshold. You may find that even above the threshold you actually get more useful results in the presence of noise.)
Another example. Suppose you are trying to detect oscillating signals (musical notes, radio waves, …) via an analogue-to-digital converter. Let’s say its resolution is 1 unit. Then a signal oscillating between −0.5 and +0.5 will not show up at all: every time you sample it you’ll get zero. And any small change to the signal will make exactly no difference to the output. But if you add enough noise to that signal, it becomes detectable. You’ll need to average your data (or do something broadly similar); you’ll have some risk of false positives; but if you have enough data you can measure the signal pretty well even though it’s well below the threshold of your ADC.
[EDITED to add:] It may be worth observing that there’s nothing super-special about adding random stuff for this purpose. E.g., suppose you’re trying to measure some non-varying value using an analogue-to-digital converter, and the value you’re trying to measure is smaller than the resolution in your ADC. You could (as discussed above) add noise and average. But if you happen to have the ability to add non-random offsets to your data before measuring, you can do that and get better results than with random offsets.
In other words, this is not an exception to the principle Eliezer proposes, that anything you can improve by adding randomness you can improve at least as much by adding something not-so-random instead.