Yes, that sounds right—such an fn exists. And expressing it in fourier series makes it clear. So the “not much” in “doesn’t much matter” is doing a lot of work.
I took his meaning as something like “reasonably small changes to the distributions di in D∗=d1∗⋯∗dndon’t change the qualitative properties of D∗”. I liked that he pointed it out, because a common version of the CLT stipulates that the random variables must be identically distributed, and I really want readers here to know: No! That isn’t necessary! The distributions can be different (as long as they’re not too different)!
But it sounds like you’re taking it more literally. Hm. Maybe I should edit that part a bit.
The fourier transform as a map between function spaces is continuous, one-to-one1 and maps gaussians to gaussians, so we can translate “convoluting nice distribution sequences tends towards gaussians” to “multiplying nice function sequences tends towards gaussians”. The pointwise logarithm2 as a map between function spaces is continuous, one-to-one3 and maps gaussians to parabolas, so we can translate further to “nice function series tend towards parabolae”, which sounds more almost always false than usually true.
1In the second space, functions are continuous and vanish at infinity. 2This doesn’t work if the fourier transform of some function is somewhere negative, but then multiplying the function sequence has zeroes. 3In the third space, functions are continuous and diverge down at infinity.
Hm. What you’re saying sounds reasonable, and is an interesting way to look at it, but then I’m having trouble reconciling it with how widely the central limit theorem applies in practice. Is the difference just that the space of functions is much larger than the space of probability distributions people typically work with? For now I’ve added an asterisk telling readers to look down here for some caution on the kernel quote.
Suppose you take a bunch of differentiable functions, all of which have a global maximum at 0, and add them pointwise.
Usually you will get a single peak at 0 towering above the rest. The only special case is if ∃x≠0:∀i:fi(x)=fi(0) In the neighbourhood of 0, the function is approximately parabolic. (Its differentiable.) You take the exponent, this squashes everything but the highest peak down to nearly 0. (In relative terms). The highest peak turns into a sharp spiky Gaussian . You take the inverse Fourier transform and get a shallow Gaussian.
Even if you are unlucky enough to start with several equally high peaks in your fi’s then you still get something thats kind of a Gaussian. This is the case of a perfectly multimodal distribution, something 0 except on exact multiples of a number. The number of heads in a million coin flips forms a Gaussian out of dirac deltas at the integers.
But the condition of having a maximum at 0 in the Fourier transform is weaker than always being positive. If ∀x:f(x)≥0 then ∫f(x)dx≥∫f(x)e2πiθxdx
That’s counterintuitive. Surely for every f1,f2…fn−1 there’s an fn that’ll get you anywhere? If F{f∗g}=F{f}F{g}, fn:=F−1{F{target}/F{f1,f2…fn−1}}.
Yes, that sounds right—such an fn exists. And expressing it in fourier series makes it clear. So the “not much” in “doesn’t much matter” is doing a lot of work.
I took his meaning as something like “reasonably small changes to the distributions di in D∗=d1∗⋯∗dndon’t change the qualitative properties of D∗”. I liked that he pointed it out, because a common version of the CLT stipulates that the random variables must be identically distributed, and I really want readers here to know: No! That isn’t necessary! The distributions can be different (as long as they’re not too different)!
But it sounds like you’re taking it more literally. Hm. Maybe I should edit that part a bit.
The fourier transform as a map between function spaces is continuous, one-to-one1 and maps gaussians to gaussians, so we can translate “convoluting nice distribution sequences tends towards gaussians” to “multiplying nice function sequences tends towards gaussians”. The pointwise logarithm2 as a map between function spaces is continuous, one-to-one3 and maps gaussians to parabolas, so we can translate further to “nice function series tend towards parabolae”, which sounds more almost always false than usually true.
1In the second space, functions are continuous and vanish at infinity.
2This doesn’t work if the fourier transform of some function is somewhere negative, but then multiplying the function sequence has zeroes.
3In the third space, functions are continuous and diverge down at infinity.
Hm. What you’re saying sounds reasonable, and is an interesting way to look at it, but then I’m having trouble reconciling it with how widely the central limit theorem applies in practice. Is the difference just that the space of functions is much larger than the space of probability distributions people typically work with? For now I’ve added an asterisk telling readers to look down here for some caution on the kernel quote.
Suppose you take a bunch of differentiable functions, all of which have a global maximum at 0, and add them pointwise.
Usually you will get a single peak at 0 towering above the rest. The only special case is if ∃x≠0:∀i:fi(x)=fi(0) In the neighbourhood of 0, the function is approximately parabolic. (Its differentiable.) You take the exponent, this squashes everything but the highest peak down to nearly 0. (In relative terms). The highest peak turns into a sharp spiky Gaussian . You take the inverse Fourier transform and get a shallow Gaussian.
Even if you are unlucky enough to start with several equally high peaks in your fi’s then you still get something thats kind of a Gaussian. This is the case of a perfectly multimodal distribution, something 0 except on exact multiples of a number. The number of heads in a million coin flips forms a Gaussian out of dirac deltas at the integers.
But the condition of having a maximum at 0 in the Fourier transform is weaker than always being positive. If ∀x:f(x)≥0 then ∫f(x)dx≥∫f(x)e2πiθxdx