Set up: Let K denote either the field of real or the field of complex numbers. Suppose that d1,…,dr are positive integers. Let m0,…,mn be a sequence of positive integers with m0=mn=1. Suppose that Xi,j is an mi−1×mi-matrix whenever 1≤j≤di. Then from the matrices Xi,j, we can define a d1×⋯×dr-tensor T((Xi,j)i,j)=(X1,i1…Xn,in)i1,…,in. I have been doing computer experiments where I use this tensor to approximate other tensors by minimizing the ℓ2-distance. I have not seen this tensor approximation algorithm elsewhere, but perhaps someone else has produced this tensor approximation construction before. In previous shortform posts on this site, I have given evidence that the tensor dimensionality reduction behaves well, and in this post, we will focus on ways to compute with the tensors T((Xi,j)i,j), namely the inner product of such tensors and the gradient of the inner product with respect to the matrices (Xi,j)i,j.
Notation: If A1,…,Ar,B1,…,Br are matrices, then let Γ(A1,…,Ar;B1,…,Br) denote the superoperator defined by letting Γ(A1,…,Ar;B1,…,Br)(X)=A1XB∗1+⋯+ArXB∗r. Let Φ(A1,…,Ar)=Γ(A1,…,Ar;A1,…,Ar).
Inner product: Here is the computation of the inner product of our tensors.
In particular, ∥T((Ai,j)i,j)∥2=Φ(A1,1,…,A1,d1)…Φ(An,1,…,An,dn).
Gradient: Observe that ∇XTr(AX)=AT. We will see shortly that the cyclicity of the trace is useful for calculating the gradient. And here is my manual calculation of the gradient of the inner product of our tensors.
Let’s compute some inner products and gradients.
Set up: Let K denote either the field of real or the field of complex numbers. Suppose that d1,…,dr are positive integers. Let m0,…,mn be a sequence of positive integers with m0=mn=1. Suppose that Xi,j is an mi−1×mi-matrix whenever 1≤j≤di. Then from the matrices Xi,j, we can define a d1×⋯×dr-tensor T((Xi,j)i,j)=(X1,i1…Xn,in)i1,…,in. I have been doing computer experiments where I use this tensor to approximate other tensors by minimizing the ℓ2-distance. I have not seen this tensor approximation algorithm elsewhere, but perhaps someone else has produced this tensor approximation construction before. In previous shortform posts on this site, I have given evidence that the tensor dimensionality reduction behaves well, and in this post, we will focus on ways to compute with the tensors T((Xi,j)i,j), namely the inner product of such tensors and the gradient of the inner product with respect to the matrices (Xi,j)i,j.
Notation: If A1,…,Ar,B1,…,Br are matrices, then let Γ(A1,…,Ar;B1,…,Br) denote the superoperator defined by letting Γ(A1,…,Ar;B1,…,Br)(X)=A1XB∗1+⋯+ArXB∗r. Let Φ(A1,…,Ar)=Γ(A1,…,Ar;A1,…,Ar).
Inner product: Here is the computation of the inner product of our tensors.
⟨T((Ai,j)i,j),T((Bi,j)i,j)⟩
=⟨(A1,i1…An,in)i1,…,in,(B1,i1…Bn,in)i1,…,in⟩
=∑i1,…,inA1,i1…An,in(B1,i1…Bn,in)∗
=∑i1,…,inA1,i1…An,inB∗n,in…B∗1,i1
=Γ(A1,1,…,A1,d1;B1,1,…,B1,d1)…Γ(An,1,…,An,dn;Bn,1,…,Bn,dn).
In particular, ∥T((Ai,j)i,j)∥2=Φ(A1,1,…,A1,d1)…Φ(An,1,…,An,dn).
Gradient: Observe that ∇XTr(AX)=AT. We will see shortly that the cyclicity of the trace is useful for calculating the gradient. And here is my manual calculation of the gradient of the inner product of our tensors.
∇Xα,β⟨T((Xi,j)i,j),T((Ai,j)i,j)⟩
=∇Xα,β∑i1,…,inX1,i1…Xn,inA∗n,in…A∗1,i1
=∇Xα,βTr(∑i1,…,inX1,i1…Xn,inA∗n,in…A∗1,i1)
=∇Xα,βTr(∑i1,…,inXα,iα…Xn,inA∗n,in…
A∗α+1,iα+1A∗α,iαA∗α−1,iα−1…A∗1,i1X1,i1…Xα−1,iα−1)
=∇Xα,βTr(∑iα+1,…,in,i1,…,iα−1Xα,β…Xn,inA∗n,in…
A∗α+1,iα+1A∗α,βA∗α−1,iα−1…A∗1,i1X1,i1…Xα−1,iα−1)
=(∑iα+1,…,in,i1,…,iα−1Xα+1,iα+1…Xn,inA∗n,in…
A∗α+1,iα+1A∗α,βA∗α−1,iα−1…A∗1,i1X1,i1…Xα−1,iα−1)T
=(∑iα+1,…,in,i1,…,iα−1Xα+1,iα+1…Xn,in
A∗n,in…A∗α+1,iα+1A∗α,βA∗α−1,iα−1…A∗1,i1X1,i1…Xα−1,iα−1)T
=[(Γ(Xα+1,1,…,Xα+1,dα+1;Aα+1,1,…,Aα+1,dα+1)⋯
Γ(Xn,1,…,Xn,dn;An,1,…,An,dn)(1))
A∗α,β
((Γ(A∗α−1,1,…,A∗α−1,dα−1;X∗α−1,1,…,X∗α−1,dα−1)…
Γ(A∗1,1,…,A∗1,d1;X∗1,1,…,X∗1,d1)(1))]T.