Noise Simplifies
They say that the flap of a butterfly’s wings can change the course of a hurricane. Sadly, this does not give us an effective strategy for controlling hurricane trajectories—if a flap of my butterfly’s wings can change the hurricane’s course, then so can the flap of any other butterfly’s wings. Unless I know exactly how all of the world’s butterflies are flapping, I do not know how my own butterfly’s wings must flap to achieve the course I want.
On the other hand, even though I do not know how the vast majority of the world’s butterflies are flapping, it is far easier to estimate a distribution of possible hurricane-paths in my state of ignorance than it would be to compute the exact hurricane path in a state of full knowledge.
In general, we usually think of computing with uncertainty as harder than computing without it—Bayesian reasoning requires tracking all possible worlds, which is exponentially more difficult than just tracking one. Yet in practice, uncertainty often makes large systems simpler to reason about. Why? Because noise wipes out long-range correlations. Unless I know how the wings of all the world’s butterflies are flapping, the flapping of any particular butterfly gives me no information at all about the path of a hurricane. So in practice, weather forecasters do not need to worry about tracking butterflies.
Let’s look at a toy model, to build some intuition for this idea.
I have a long list of randomly-chosen numbers between 1 and 10, and I want to know whether their sum is even or odd. If I know all of the numbers, then I can perform a relatively complicated calculation: I can add them all up, divide by 2, and find the remainder. But if there is even just one number in the list whose value I do not know, then all the rest of the numbers tell me absolutely nothing about whether the sum is even or odd. Even just a little bit of noise wipes out all of the signal completely.
On the other hand, if I don’t know the values of one or more of the numbers, then my “model” for this system is much simpler: whether the sum is even or odd is independent of all the numbers I know. If I’m reasoning about the sum, I can just forget about all those numbers entirely. In this sense, noise makes the system much simpler to reason about; there’s no need for all that addition and division, and we don’t even need to remember the numbers.
A rough general insight from this model: if changing any input of a system changes the output, then a complete lack of information about any input implies a complete lack of information about the output—no matter how much information we have about all the other inputs. When a system is very sensitive to all of its inputs, just a little bit of noise makes all of our information about the inputs irrelevant—which makes the system much simpler to model.
On the other hand, what if a system is not very sensitive to all of its inputs? Well, then we can build simplified approximate models of the system anyway, just by using rough estimates for whatever inputs it isn’t very sensitive to. There’s a kind of duality here: if the system isn’t very sensitive, we can model it well as only depending on some coarse estimates of the inputs; if it is very sensitive, we can model it as independent of most inputs as long as just a few are unknown.
It seems weird to describe the system as “simpler to model” when it seems more that “the best model is simple”. Like, I can still use a coin flip model for the list of numbers if I know all the numbers; I just also can use a more complicated, more accurate model.
I find your example here somewhat misleading. Suppose your random numbers weren’t randomly drawn from 1-10, but from {2,4,6,8,10}∪{1}. If you don’t know a single number, you still know that there is a 5:1 chance that it will be even (and hence not change the parity of the sum of the whole list). So if a single number is unknown, you will still want to take the sum of the ones you do know. In this light, your example seems like an exception, rather than the norm. (My main issue with it is that since it feels very ad-hoc, you might subconsciously come to the impression that the described behaviour is the norm.)
However, it might easily be that the class of these “exception” is important on its own. So I wouldn’t want to shoot down the overall idea described in the post—I like it :-).
This is basically correct if only a single number is unknown. But note that, as the amount of unknown numbers increases, the odds ratio for the sum being even quickly decays toward 1:1. If the odds are n:1 with a single unknown number, then ~n unknown numbers should put us close to 1:1 (and we should approach 1:1 asymptotically at a rate which scales inversely with n).
That’s the more realistic version of the thought-experiment: we have N inputs, and any single input unknown would leave us with at-worst n:1 odds on guessing the outcome. As long as N >> n, and a nontrivial fraction of the inputs are unknown, the signal is wiped out.