dsj comments on Deep learning models might be secretly (almost) linear

dsj 24 Apr 2023 21:32 UTC
4 points
0
A key distinction is between linearity in the weights vs. linearity in the input data.
For example, the function $f (a, b, x, y) = a sin (x) + b cos (y)$ is linear in the arguments $a$ and $b$ but nonlinear in the arguments $x$ and $y$ , since $sin$ and $cos$ are nonlinear.
Similarly, we have evidence that wide neural networks $f (x; θ)$ are (almost) linear in the parameters $θ$ , despite being nonlinear in the input data $x$ (due e.g. to nonlinear activation functions such as ReLU). So nonlinear activation functions are not a counterargument to the idea of linearity with respect to the parameters.
If this is so, then neural networks are almost a type of kernel machine, doing linear learning in a space of features which are themselves a fixed nonlinear function of the input data.