Isn’t this extremely easy to directly verify empirically?
Take a neural network $f$ trained on some standard task, like ImageNet or something. Evaluate $|f(kx) - kf(x)|$ on a bunch of samples $x$ from the dataset, and $f(x+y) - f(x) - f(y)$ on samples $x, y$. If it’s “almost linear”, then the difference should be very small on average. I’m not sure right now how to define “very small”, but you could compare it e.g. to the distance distribution $|f(x) - f(y)|$ of independent samples, also depending on what the head is.
FWIW my opinion is that all this “circumstantial evidence” is a big non sequitur, and the base statement is fundamentally wrong. But it seems like such an easily testable hypothesis that it’s more effort to discuss it than actually verify it.
At least how I would put this—I don’t think the important part is that NNs are literally almost linear, when viewed as input-output functions. More like, they have linearly represented features (i.e. directions in activation space, either in the network as a whole or at a fixed layer), or there are other important linear statistics of their weights (linear mode connectivity) or activations (linear probing).
Maybe beren can clarify what they had in mind, though.
Yes. The idea is that the latent space of the neural network’s ‘features’ are ‘almost linear’ which is reflected in both the linear-ish properties of the weights and activations. Not that the literal I/O mapping of the NN is linear, which is clearly false.
More concretely, as an oversimplified version of what I am saying, it might be possible to think of neural networks as a combined encoder and decoder to a linear vector space. I.e. we have nonlinear function f and g which encode the input x to a latent space z and g which decodes it to the output y -i.e. f(x) = z and g(z) = y. We the hypothesise that the latent space z is approximately linear such that we can perform addition and weighted sums of zs as well as scaling individual directions in z and these get decoded to the appropriate outputs which correspond to sums or scalings of ‘natural’ semantic features we should expect in the input or output.
Isn’t this extremely easy to directly verify empirically?
Take a neural network $f$ trained on some standard task, like ImageNet or something. Evaluate $|f(kx) - kf(x)|$ on a bunch of samples $x$ from the dataset, and $f(x+y) - f(x) - f(y)$ on samples $x, y$. If it’s “almost linear”, then the difference should be very small on average. I’m not sure right now how to define “very small”, but you could compare it e.g. to the distance distribution $|f(x) - f(y)|$ of independent samples, also depending on what the head is.
FWIW my opinion is that all this “circumstantial evidence” is a big non sequitur, and the base statement is fundamentally wrong. But it seems like such an easily testable hypothesis that it’s more effort to discuss it than actually verify it.
At least how I would put this—I don’t think the important part is that NNs are literally almost linear, when viewed as input-output functions. More like, they have linearly represented features (i.e. directions in activation space, either in the network as a whole or at a fixed layer), or there are other important linear statistics of their weights (linear mode connectivity) or activations (linear probing).
Maybe beren can clarify what they had in mind, though.
Yes. The idea is that the latent space of the neural network’s ‘features’ are ‘almost linear’ which is reflected in both the linear-ish properties of the weights and activations. Not that the literal I/O mapping of the NN is linear, which is clearly false.
More concretely, as an oversimplified version of what I am saying, it might be possible to think of neural networks as a combined encoder and decoder to a linear vector space. I.e. we have nonlinear function f and g which encode the input x to a latent space z and g which decodes it to the output y -i.e. f(x) = z and g(z) = y. We the hypothesise that the latent space z is approximately linear such that we can perform addition and weighted sums of zs as well as scaling individual directions in z and these get decoded to the appropriate outputs which correspond to sums or scalings of ‘natural’ semantic features we should expect in the input or output.