Joseph Van Name comments on Deep learning models might be secretly (almost) linear

Joseph Van Name 13 Oct 2023 19:31 UTC
1 point
0
Neural networks with ReLU activation are the things you obtain when you combine two kinds of linearity, namely the standard linearity that we all should be familiar with and tropical linearity.
Give $R$ two operations $\otimes, \oplus$ defined by setting $x \oplus y = max (x, y), x \otimes y = x + y$ . Then the operations $\oplus, \otimes$ are associative, commutative, and they satisfy the distributivity property $x \otimes (y \oplus z) = (x \otimes y) \oplus (x \otimes z)$ . We shall call the operations $\oplus, \otimes$ tropical operations on $R$ .
We can even perform matrix and vector operations by replacing the operations $+, \cdot$ with their tropical counterparts $\oplus, \otimes$ . More explicitly, we can define the tropical matrix addition and multiplication operations by setting
$(a_{i, j})_{i, j} \oplus (b_{i, j})_{i, j} = (a_{i, j} \oplus b_{i, j})_{i, j} = (max (a_{i, j}, b_{i, j}))_{i, j}$ and
$(a_{i, j})_{i, j} \otimes (b_{i, j})_{i, j} = (\oplus_{k} a_{i, k} \otimes b_{k, j})_{i, j} = ({max}_{k} (a_{i, k} + b_{k, j}))_{i, j}$ .
Here, the ReLU operation is just $ReLU (x) = x \oplus 0$ , and if $0$ is the zero vector and $v$ is a real vector, then $ReLU (v) = v \oplus 0$ , so ReLU does not even rely on tropical matrix multiplication.
Of course, one can certainly construct and train neural networks using tropical matrix multiplication in the layers of the form $v \mapsto B \otimes (A v + a) \oplus b$ where $A, B$ are weight matrices and $a, b$ are bias vectors, but I do not know of any experiments done with these sorts of neural networks, so I am uncertain of what advantages they offer.
Since ReLU neural networks are a combination of two kinds of linearity, one might expect for ReLU neural networks to behave nearly linearly. And it is not surprising that ReLU networks look more like the standard linear transformations than the tropical linear transformations since the standard linear transformations in a neural network are far more complex than the ReLU. ReLU just provides the bare minimum non-linearity for a neural network without doing anything fancy.