Joseph Van Name comments on Joseph Van Name’s Shortform

Joseph Van Name 18 Sep 2024 23:37 UTC
1 point
0
I personally like my machine learning algorithms to behave mathematically especially when I give them mathematical data. For example, a fitness function with apparently one local maximum value is a mathematical fitness function. It is even more mathematical if one can prove mathematical theorems about such a fitness function or if one can completely describe the local maxima of such a fitness function. It seems like fitness functions that satisfy these mathematical properties are more interpretable than the fitness functions which do not, so people should investigate such functions for AI safety purposes.
My notion of an LSRDR is a notion that satisfies these mathematical properties. To demonstrate the mathematical behavior of LSRDRs, let’s see what happens when we take an LSRDR of the octonions.
Let $K$ denote either the field of real numbers or the field of complex numbers ( $K$
could also be the division ring of quaternions, but for simplicity, let’s not go there). If $A_{1}, \dots, A_{r}$ are $n \times n$ -matrices over $K$ , then an LSRDR ( $L_{2, d}$ -spectral radius dimensionality reduction) of $A_{1}, \dots, A_{r}$ is a collection $X_{1}, \dots, X_{r}$ of $d \times d$ -matrices that locally maximizes the fitness level
$\frac{ρ (A_{1} \otimes_{1} + \dots + A_{r} \otimes_{r})}{ρ (X_{1} \otimes_{1} + \dots + X_{r} \otimes_{r})^{1 / 2}}$ . $ρ$ denotes the spectral radius function while $\otimes$ denotes the tensor product and $¯ ¯¯ ¯ Z$ denotes the matrix obtained from $Z$ by replacing each entry with its complex conjugate. We shall call the maximum fitness level the $L_{2, d}$ -spectral radius of $A_{1}, \dots, A_{r}$ over the field $K$ , and we shall wrote $ρ_{2, d}^{K} (A_{1}, \dots, A_{r})$ for this spectral radius.
Define the linear superoperator $Γ (A_{1}, \dots, A_{r}; X_{1}, \dots, X_{r})$ by setting
$Γ (A_{1}, \dots, A_{r}; X_{1}, \dots, X_{r}) (X) = A_{1} X X_{1}^{*} + \dots + A_{r} X X_{r}^{*}$ and set $Φ (X_{1}, \dots, X_{r}) = Γ (X_{1}, \dots, X_{r}; X_{1}, \dots, X_{r})$ . Then the fitness level of $X_{1}, \dots, X_{r}$ is $\frac{ρ (Γ (A_{1}, \dots, A_{r}; X_{1}, \dots, X_{r}))}{Φ (X_{1}, \dots, X_{r})^{1 / 2}}$ .
Suppose that $V$ is an $8$ -dimensional real inner product space. Then the octonionic multiplication operation is the unique up-to-isomorphism bilinear binary operation $*$ on $V$ together with a unit $1$ such that $∥ x * y ∥ = ∥ x ∥ \cdot ∥ y ∥$ and $1 * x = x * 1 = 1$ for all x $, y \in V$ . If we drop the condition that the octonions have a unit, then we do not quite have this uniqueness result.
We say that an octonion-like algbera is a $8$ -dimensional real inner product space $V$ together with a unique up-to-isomorphism bilinear operation $*$ such that $∥ x * y ∥ = ∥ x ∥ \cdot ∥ y ∥$ for all $x, y$ .
Let $V$ be a specific octonion-like algebra.
Suppose now that $e_{1}, \dots, e_{8}$ is an orthonormal basis for $V$ (this does not need to be the standard basis). Then for each $j \in {1, \dots, 8}$ , let $A_{j}$ be the linear operator from $V$ to $V$ defined by setting $A_{j} v = e_{j} * v$ for all vectors $v$ . All non-zero linear combinations of $A_{1}, \dots, A_{8}$ are conformal mappings (this means that they preserve angles). Now that we have turned the octonion-like algebra into matrices, we can take an LSRDR of the octonion-like algebras, but when taking the LSRDR of octonion-like algebras, we should not worry about the choice of orthonormal basis $e_{1}, \dots, e_{8}$ since I could formulate everything in a coordinate-free manner.
Empirical Observation from computer calculations: Suppose that $1 \leq d \leq 8$ and $K$ is the field of real numbers. Then the following are equivalent.
1. The $d \times d$ matrices $X_{1}, \dots, X_{8}$ are a LSRDR of $A_{1}, \dots, A_{8}$ over $K$ where $A_{1} \otimes X_{1} + \dots + A_{8} \otimes X_{8}$ has a unique real dominant eigenvalue.
2. There exists matrices $R, S$ where $X_{j} = R A_{j} S$ for all $j$ and where $S R$ is an orthonormal projection matrix.
In this case, $ρ_{2, d}^{K} (A_{1}, \dots, A_{8}) = \sqrt{d}$ and this fitness level is reached by the matrices $X_{1}, \dots, X_{8}$ in the above equivalent statements. Observe that the superoperator $Γ (A_{1}, \dots, A_{8}; P A_{1} P, \dots, P A_{8} P)$ is similar to a direct sum of $Γ (A_{1}, \dots, A_{r}; X_{1}, \dots, X_{r}))$ and a zero matrix. But the projection matrix $P$ is a dominant eigenvector of $Γ (A_{1}, \dots, A_{8}; P A_{1} P, \dots, P A_{8} P)$ and of $Φ (P A_{1} P, \dots, P A_{8} P)$ as well.
I have no mathematical proof of the above fact though.
Now suppose that $K = C$ . Then my computer calculations yield the following complex $L_{2, d}$ -spectral radii: $(ρ_{2, j}^{K} (A_{1}, \dots, A_{8}))_{j = 1}^{8}$
$= (2, 4, 2 + \sqrt{8}, 5.4676355784..., 6.1977259251..., 4 + \sqrt{8}, 7.2628726081..., 8)$
Each time that I have trained a complex LSRDR of $A_{1}, \dots, A_{8}$ , I was able to find a fitness level that is not just a local optimum but also a global optimum.
In the case of the real LSRDRs, I have a complete description of the LSRDRs of $(A_{1}, \dots, A_{8})$ . This demonstrates that the octonion-like algebras are elegant mathematical structures and that LSRDRs behave mathematically in a manner that is compatible with the structure of the octonion-like algebras.
I have made a few YouTube videos that animate the process of gradient ascent to maximize the fitness level.
Edit: I have made some corrections to this post on 9/22/2024.
Fitness levels of complex LSRDRs of the octonions (youtube.com)