Joseph Van Name comments on Joseph Van Name’s Shortform

Joseph Van Name 18 Dec 2023 23:34 UTC
1 point
0
Let’s compute some inner products and gradients.
Set up: Let $K$ denote either the field of real or the field of complex numbers. Suppose that $d_{1}, \dots, d_{r}$ are positive integers. Let $m_{0}, \dots, m_{n}$ be a sequence of positive integers with $m_{0} = m_{n} = 1$ . Suppose that $X_{i, j}$ is an $m_{i - 1} \times m_{i}$ -matrix whenever $1 \leq j \leq d_{i}$ . Then from the matrices $X_{i, j}$ , we can define a $d_{1} \times \dots \times d_{r}$ -tensor $T ((X_{i, j})_{i, j}) = (X_{1, i_{1}} \dots X_{n, i_{n}})_{i_{1}, \dots, i_{n}}$ . I have been doing computer experiments where I use this tensor to approximate other tensors by minimizing the $ℓ_{2}$ -distance. I have not seen this tensor approximation algorithm elsewhere, but perhaps someone else has produced this tensor approximation construction before. In previous shortform posts on this site, I have given evidence that the tensor dimensionality reduction behaves well, and in this post, we will focus on ways to compute with the tensors $T ((X_{i, j})_{i, j})$ , namely the inner product of such tensors and the gradient of the inner product with respect to the matrices $(X_{i, j})_{i, j}$ .
Notation: If $A_{1}, \dots, A_{r}, B_{1}, \dots, B_{r}$ are matrices, then let $Γ (A_{1}, \dots, A_{r}; B_{1}, \dots, B_{r})$ denote the superoperator defined by letting $Γ (A_{1}, \dots, A_{r}; B_{1}, \dots, B_{r}) (X) = A_{1} X B_{1}^{*} + \dots + A_{r} X B_{r}^{*}$ . Let $Φ (A_{1}, \dots, A_{r}) = Γ (A_{1}, \dots, A_{r}; A_{1}, \dots, A_{r})$ .
Inner product: Here is the computation of the inner product of our tensors.
$⟨ T ((A_{i, j})_{i, j}), T ((B_{i, j})_{i, j}) ⟩$
$= ⟨ (A_{1, i_{1}} \dots A_{n, i_{n}})_{i_{1}, \dots, i_{n}}, (B_{1, i_{1}} \dots B_{n, i_{n}})_{i_{1}, \dots, i_{n}} ⟩$
$= \sum_{i_{1}, \dots, i_{n}} A_{1, i_{1}} \dots A_{n, i_{n}} (B_{1, i_{1}} \dots B_{n, i_{n}})^{*}$
$= \sum_{i_{1}, \dots, i_{n}} A_{1, i_{1}} \dots A_{n, i_{n}} B_{n, i_{n}}^{*} \dots B_{1, i_{1}}^{*}$
$= Γ (A_{1, 1}, \dots, A_{1, d_{1}}; B_{1, 1}, \dots, B_{1, d_{1}}) \dots Γ (A_{n, 1}, \dots, A_{n, d_{n}}; B_{n, 1}, \dots, B_{n, d_{n}})$ .
In particular, $∥ T ((A_{i, j})_{i, j}) ∥^{2} = Φ (A_{1, 1}, \dots, A_{1, d_{1}}) \dots Φ (A_{n, 1}, \dots, A_{n, d_{n}})$ .
Gradient: Observe that $\nabla_{X} Tr (A X) = A^{T}$ . We will see shortly that the cyclicity of the trace is useful for calculating the gradient. And here is my manual calculation of the gradient of the inner product of our tensors.
$\nabla_{X_{α, β}} ⟨ T ((X_{i, j})_{i, j}), T ((A_{i, j})_{i, j}) ⟩$
$= \nabla_{X_{α, β}} \sum_{i_{1}, \dots, i_{n}} X_{1, i_{1}} \dots X_{n, i_{n}} A_{n, i_{n}}^{*} \dots A_{1, i_{1}}^{*}$
$= \nabla_{X_{α, β}} Tr (\sum_{i_{1}, \dots, i_{n}} X_{1, i_{1}} \dots X_{n, i_{n}} A_{n, i_{n}}^{*} \dots A_{1, i_{1}}^{*})$
$= \nabla_{X_{α, β}} Tr (\sum_{i_{1}, \dots, i_{n}} X_{α, i_{α}} \dots X_{n, i_{n}} A_{n, i_{n}}^{*} \dots$
$A_{α + 1, i_{α + 1}}^{*} A_{α, i_{α}}^{*} A_{α - 1, i_{α - 1}}^{*} \dots A_{1, i_{1}}^{*} X_{1, i_{1}} \dots X_{α - 1, i_{α - 1}})$
$= \nabla_{X_{α, β}} Tr (\sum_{i_{α + 1}, \dots, i_{n}, i_{1}, \dots, i_{α - 1}} X_{α, β} \dots X_{n, i_{n}} A_{n, i_{n}}^{*} \dots$
$A_{α + 1, i_{α + 1}}^{*} A_{α, β}^{*} A_{α - 1, i_{α - 1}}^{*} \dots A_{1, i_{1}}^{*} X_{1, i_{1}} \dots X_{α - 1, i_{α - 1}})$
$= (\sum_{i_{α + 1}, \dots, i_{n}, i_{1}, \dots, i_{α - 1}} X_{α + 1, i_{α + 1}} \dots X_{n, i_{n}} A_{n, i_{n}}^{*} \dots$
$A_{α + 1, i_{α + 1}}^{*} A_{α, β}^{*} A_{α - 1, i_{α - 1}}^{*} \dots A_{1, i_{1}}^{*} X_{1, i_{1}} \dots X_{α - 1, i_{α - 1}})^{T}$
$= (\sum_{i_{α + 1}, \dots, i_{n}, i_{1}, \dots, i_{α - 1}} X_{α + 1, i_{α + 1}} \dots X_{n, i_{n}}$
$A_{n, i_{n}}^{*} \dots A_{α + 1, i_{α + 1}}^{*} A_{α, β}^{*} A_{α - 1, i_{α - 1}}^{*} \dots A_{1, i_{1}}^{*} X_{1, i_{1}} \dots X_{α - 1, i_{α - 1}})^{T}$
$= [(Γ (X_{α + 1, 1}, \dots, X_{α + 1, d_{α + 1}}; A_{α + 1, 1}, \dots, A_{α + 1, d_{α + 1}}) \dots$
$Γ (X_{n, 1}, \dots, X_{n, d_{n}}; A_{n, 1}, \dots, A_{n, d_{n}}) (1))$
$A_{α, β}^{*}$
$((Γ (A_{α - 1, 1}^{*}, \dots, A_{α - 1, d_{α - 1}}^{*}; X_{α - 1, 1}^{*}, \dots, X_{α - 1, d_{α - 1}}^{*}) \dots$
$Γ (A_{1, 1}^{*}, \dots, A_{1, d_{1}}^{*}; X_{1, 1}^{*}, \dots, X_{1, d_{1}}^{*}) (1))]^{T}$ .