I personally like my machine learning algorithms to behave mathematically especially when I give them mathematical data. For example, a fitness function with apparently one local maximum value is a mathematical fitness function. It is even more mathematical if one can prove mathematical theorems about such a fitness function or if one can completely describe the local maxima of such a fitness function. It seems like fitness functions that satisfy these mathematical properties are more interpretable than the fitness functions which do not, so people should investigate such functions for AI safety purposes.
My notion of an LSRDR is a notion that satisfies these mathematical properties. To demonstrate the mathematical behavior of LSRDRs, let’s see what happens when we take an LSRDR of the octonions.
Let K denote either the field of real numbers or the field of complex numbers (K
could also be the division ring of quaternions, but for simplicity, let’s not go there). If A1,…,Ar are n×n-matrices over K, then an LSRDR (L2,d-spectral radius dimensionality reduction) of A1,…,Ar is a collection X1,…,Xr of d×d-matrices that locally maximizes the fitness level
ρ(A1⊗¯¯¯¯¯¯X1+⋯+Ar⊗¯¯¯¯¯¯Xr)ρ(X1⊗¯¯¯¯¯¯X1+⋯+Xr⊗¯¯¯¯¯¯Xr)1/2. ρ denotes the spectral radius function while ⊗ denotes the tensor product and ¯¯¯¯Z denotes the matrix obtained from Z by replacing each entry with its complex conjugate. We shall call the maximum fitness level the L2,d-spectral radius of A1,…,Ar over the field K, and we shall wrote ρK2,d(A1,…,Ar) for this spectral radius.
Define the linear superoperator Γ(A1,…,Ar;X1,…,Xr) by setting
Γ(A1,…,Ar;X1,…,Xr)(X)=A1XX∗1+⋯+ArXX∗r and set Φ(X1,…,Xr)=Γ(X1,…,Xr;X1,…,Xr). Then the fitness level of X1,…,Xr is ρ(Γ(A1,…,Ar;X1,…,Xr))Φ(X1,…,Xr)1/2.
Suppose that V is an 8-dimensional real inner product space. Then the octonionic multiplication operation is the unique up-to-isomorphism bilinear binary operation ∗ on V together with a unit 1 such that∥x∗y∥=∥x∥⋅∥y∥ and 1∗x=x∗1=1 for all x,y∈V. If we drop the condition that the octonions have a unit, then we do not quite have this uniqueness result.
We say that an octonion-like algbera is a 8-dimensional real inner product space V together with a unique up-to-isomorphism bilinear operation ∗ such that ∥x∗y∥=∥x∥⋅∥y∥ for all x,y.
Let V be a specific octonion-like algebra.
Suppose now that e1,…,e8 is an orthonormal basis for V (this does not need to be the standard basis). Then for each j∈{1,…,8}, let Aj be the linear operator from V to V defined by setting Ajv=ej∗v for all vectors v. All non-zero linear combinations of A1,…,A8 are conformal mappings (this means that they preserve angles). Now that we have turned the octonion-like algebra into matrices, we can take an LSRDR of the octonion-like algebras, but when taking the LSRDR of octonion-like algebras, we should not worry about the choice of orthonormal basis e1,…,e8 since I could formulate everything in a coordinate-free manner.
Empirical Observation from computer calculations: Suppose that 1≤d≤8 and K is the field of real numbers. Then the following are equivalent.
The d×d matrices X1,…,X8 are a LSRDR of A1,…,A8 over K where A1⊗X1+⋯+A8⊗X8 has a unique real dominant eigenvalue.
There exists matrices R,S where Xj=RAjS for all j and where SR is an orthonormal projection matrix.
In this case, ρK2,d(A1,…,A8)=√d and this fitness level is reached by the matrices X1,…,X8 in the above equivalent statements. Observe that the superoperator Γ(A1,…,A8;PA1P,…,PA8P) is similar to a direct sum of Γ(A1,…,Ar;X1,…,Xr)) and a zero matrix. But the projection matrix P is a dominant eigenvector of Γ(A1,…,A8;PA1P,…,PA8P) and ofΦ(PA1P,…,PA8P) as well.
I have no mathematical proof of the above fact though.
Now suppose that K=C. Then my computer calculations yield the following complex L2,d-spectral radii: (ρK2,j(A1,…,A8))8j=1
Each time that I have trained a complex LSRDR of A1,…,A8, I was able to find a fitness level that is not just a local optimum but also a global optimum.
In the case of the real LSRDRs, I have a complete description of the LSRDRs of (A1,…,A8). This demonstrates that the octonion-like algebras are elegant mathematical structures and that LSRDRs behave mathematically in a manner that is compatible with the structure of the octonion-like algebras.
I have made a few YouTube videos that animate the process of gradient ascent to maximize the fitness level.
Edit: I have made some corrections to this post on 9/22/2024.
I personally like my machine learning algorithms to behave mathematically especially when I give them mathematical data. For example, a fitness function with apparently one local maximum value is a mathematical fitness function. It is even more mathematical if one can prove mathematical theorems about such a fitness function or if one can completely describe the local maxima of such a fitness function. It seems like fitness functions that satisfy these mathematical properties are more interpretable than the fitness functions which do not, so people should investigate such functions for AI safety purposes.
My notion of an LSRDR is a notion that satisfies these mathematical properties. To demonstrate the mathematical behavior of LSRDRs, let’s see what happens when we take an LSRDR of the octonions.
Let K denote either the field of real numbers or the field of complex numbers (K
could also be the division ring of quaternions, but for simplicity, let’s not go there). If A1,…,Ar are n×n-matrices over K, then an LSRDR (L2,d-spectral radius dimensionality reduction) of A1,…,Ar is a collection X1,…,Xr of d×d-matrices that locally maximizes the fitness level
ρ(A1⊗¯¯¯¯¯¯X1+⋯+Ar⊗¯¯¯¯¯¯Xr)ρ(X1⊗¯¯¯¯¯¯X1+⋯+Xr⊗¯¯¯¯¯¯Xr)1/2. ρ denotes the spectral radius function while ⊗ denotes the tensor product and ¯¯¯¯Z denotes the matrix obtained from Z by replacing each entry with its complex conjugate. We shall call the maximum fitness level the L2,d-spectral radius of A1,…,Ar over the field K, and we shall wrote ρK2,d(A1,…,Ar) for this spectral radius.
Define the linear superoperator Γ(A1,…,Ar;X1,…,Xr) by setting
Γ(A1,…,Ar;X1,…,Xr)(X)=A1XX∗1+⋯+ArXX∗r and set Φ(X1,…,Xr)=Γ(X1,…,Xr;X1,…,Xr). Then the fitness level of X1,…,Xr is ρ(Γ(A1,…,Ar;X1,…,Xr))Φ(X1,…,Xr)1/2.
Suppose that V is an 8-dimensional real inner product space. Then the octonionic multiplication operation is the unique up-to-isomorphism bilinear binary operation ∗ on V together with a unit 1 such that∥x∗y∥=∥x∥⋅∥y∥ and 1∗x=x∗1=1 for all x,y∈V. If we drop the condition that the octonions have a unit, then we do not quite have this uniqueness result.
We say that an octonion-like algbera is a 8-dimensional real inner product space V together with a unique up-to-isomorphism bilinear operation ∗ such that ∥x∗y∥=∥x∥⋅∥y∥ for all x,y.
Let V be a specific octonion-like algebra.
Suppose now that e1,…,e8 is an orthonormal basis for V (this does not need to be the standard basis). Then for each j∈{1,…,8}, let Aj be the linear operator from V to V defined by setting Ajv=ej∗v for all vectors v. All non-zero linear combinations of A1,…,A8 are conformal mappings (this means that they preserve angles). Now that we have turned the octonion-like algebra into matrices, we can take an LSRDR of the octonion-like algebras, but when taking the LSRDR of octonion-like algebras, we should not worry about the choice of orthonormal basis e1,…,e8 since I could formulate everything in a coordinate-free manner.
Empirical Observation from computer calculations: Suppose that 1≤d≤8 and K is the field of real numbers. Then the following are equivalent.
The d×d matrices X1,…,X8 are a LSRDR of A1,…,A8 over K where A1⊗X1+⋯+A8⊗X8 has a unique real dominant eigenvalue.
There exists matrices R,S where Xj=RAjS for all j and where SR is an orthonormal projection matrix.
In this case, ρK2,d(A1,…,A8)=√d and this fitness level is reached by the matrices X1,…,X8 in the above equivalent statements. Observe that the superoperator Γ(A1,…,A8;PA1P,…,PA8P) is similar to a direct sum of Γ(A1,…,Ar;X1,…,Xr)) and a zero matrix. But the projection matrix P is a dominant eigenvector of Γ(A1,…,A8;PA1P,…,PA8P) and ofΦ(PA1P,…,PA8P) as well.
I have no mathematical proof of the above fact though.
Now suppose that K=C. Then my computer calculations yield the following complex L2,d-spectral radii: (ρK2,j(A1,…,A8))8j=1
=(2,4,2+√8,5.4676355784...,6.1977259251...,4+√8,7.2628726081...,8)
Each time that I have trained a complex LSRDR of A1,…,A8, I was able to find a fitness level that is not just a local optimum but also a global optimum.
In the case of the real LSRDRs, I have a complete description of the LSRDRs of (A1,…,A8). This demonstrates that the octonion-like algebras are elegant mathematical structures and that LSRDRs behave mathematically in a manner that is compatible with the structure of the octonion-like algebras.
I have made a few YouTube videos that animate the process of gradient ascent to maximize the fitness level.
Edit: I have made some corrections to this post on 9/22/2024.
Fitness levels of complex LSRDRs of the octonions (youtube.com)