Joseph Van Name comments on The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable

Joseph Van Name 14 Jul 2023 20:35 UTC
1 point
0
If the singular value decompositions of various matrices in neural networks is interpretable, then I suspect that various other spectral techniques that take more non-linear interactions into consideration would be interpretable as well. For research for cryptocurrency technologies, I have been investigating spectral techniques for evaluating block ciphers including the AES, but it seems like these spectral techniques may also be used to interpret matrices in neural networks as well (though I still need to do experiments to figure out how well this actually works in practice).
The singular value decomposition is designed to treat the weight matrix as a linear transformation between inner product spaces while ignoring all additional structure on the inner product spaces, so it seems like we may be able to use better spectral techniques for analyzing matrices in neural networks or other properties of neural networks since these other spectral techniques do not necessarily ignore the structure that neural networks contain. The SVD as other disadvantages including the numerical instability of the orthogonal matrices in the presence of nearby singular values, the lack of generalization to higher order tensors (higher order SVDs lose most of the interesting properties of the traditional SVD), and the inability for the SVD to find clusters of related dimensions (the top singular vectors may not have anything to do with each other so the SVD is not a clustering algorithm, and one cannot use PCA to find a cluster of dimensions). The singular value decomposition has been around since the 1800′s, so it is not exactly a cutting edge technique.
Suppose that we encode data and parameters as a collection $(A_{1}, \dots, A_{r}) \in M_{n} (R)^{r}$ of square matrices for some $r > 1$ . Then we reduce the dimensionality of these square matrices (I call this dimensionality reduction the $L_{2, d}$ -spectral radius dimensionality reduction and abbreviate it as LSRDR) by finding $R \in M_{d, n} (R), S \in M_{n, d} (R)$ where $R S = 1_{d}$ and where we maximize $\frac{ρ (A_{1} \otimes R A_{1} S + \dots + A_{r} \otimes R A_{r} S)}{ρ (R A_{1} S \otimes R A_{1} S + \dots + R A_{r} S \otimes R A_{r} S)^{1 / 2}}$ using gradient ascent where $ρ$ denotes the spectral radius and $\otimes$ refers to the tensor product (the gradient process is quicker than you think even though we are using tensor products and the spectral radius). The matrices $R A_{1} S, \dots, R A_{r} S$ are the matrices of reduced dimensionality. Let $P = S R$ . Then $P^{2} = S (R S) R = S 1_{d} R = S R = P$ , so $P$ will be a projection matrix but $P$ is not necessarily an orthogonal projection, and the vector spaces $ker (P)^{⊥}, im (P)$ will be clusters of dimensions in $R^{n}$ . Like the singular value decomposition, the matrix $P$ is often (but not always) unique, and this is good because when $P$ is unique, you know that your LSRDR is well-behaved and that LSRDRs are probably a good tool for whatever you are using them for, but the non-uniqueness of $P$ indicates that the LSRDRs may not be the best tool for whatever one is using them for (and there are other indicators of whether LSRDRs are doing anything meaningful). One can perform the same dimensionality reduction technique for any completely positive superoperator $E : M_{n} (R) \to M_{n} (R)$ . We therefore need to find ways of representing parts of neural networks or parts of the data coming into neural networks as 3rd and 4th order tensors or as collections of matrices or completely positive superoperators (but the higher order tensors need to be in a tensor product of the form $U \otimes V \otimes V$ or $V \otimes V \otimes V \otimes V$ for inner product spaces $U, V$ in order for the LSRDR to function).
There are probably several ways of using LSRDRs to interpret neural networks, but I still need to run experiments applying LSRDRs to neural networks to see how well they work and establish best practices for LSRDRs used in interpreting neural networks.