You may have heard of convolutional neural networks from all their success in image processing. I think they involve the same convolution I’m talking about here, but in much higher dimensions.
CNNs (in the case of image processing or playing Go) convolve over a 2-dimensional grid (as opposed to a 1d function), although the variables that are convolved are high-dimensional vectors, instead of scalars as in this post; though each element of the vector will behave similarly to a scalar on its own under convolution. The main reason why the central limit theorem doesn’t apply to CNNs (which often involve several steps of convolution applied repeatedly) is that at least one non-linear transformation is applied between each convolution, which gives rise to complex dynamics within the neural net
Gotcha. The non-linearity part “breaking” things makes sense. The main uncertainty in my head right now is whether repeatedly convolving in 2d would require more convolutions to get near gaussian than are required in 1d—like, in dimension m, do you need m times as many distributions; more than m times as many,;or can you use the same amount of convolutions as you would have in 1d? Does convergence get a lot harder as dimension increases, or does nothing special happen?
CNNs (in the case of image processing or playing Go) convolve over a 2-dimensional grid (as opposed to a 1d function), although the variables that are convolved are high-dimensional vectors, instead of scalars as in this post; though each element of the vector will behave similarly to a scalar on its own under convolution. The main reason why the central limit theorem doesn’t apply to CNNs (which often involve several steps of convolution applied repeatedly) is that at least one non-linear transformation is applied between each convolution, which gives rise to complex dynamics within the neural net
Gotcha. The non-linearity part “breaking” things makes sense. The main uncertainty in my head right now is whether repeatedly convolving in 2d would require more convolutions to get near gaussian than are required in 1d—like, in dimension m, do you need m times as many distributions; more than m times as many,;or can you use the same amount of convolutions as you would have in 1d? Does convergence get a lot harder as dimension increases, or does nothing special happen?