Here is an example of what might happen. Suppose that for each uj, we select a orthonormal basis ej,1,…,ej,s of unit vectors for V. Let R={(uj,ej,k):1≤j≤n,1≤k≤s}. Then
Then for each quantum channel E, by the concavity of the logarithm function (which is the arithmetic-geometric mean inequality), we have
L(R,E)=∑nj=1∑nk=1−log(E(uju∗j)ej,k,ej,k⟩)
≤∑nj=1−log(∑nk=1⟨E(uju∗j)ej,k,ej,k⟩)
=∑nj=1−log(Tr(E)). Here, equality is reached if and only if
E(uju∗j)ej,k,ej,k⟩=E(uju∗j)ej,l,ej,l⟩ for each j,k,l, but this equality can be achieved by the channel
defined by E(X)=Tr(X)⋅I/s which is known as the completely depolarizing channel. This is the channel that always takes a quantum state and returns the completely mixed state. On the other hand, the channel E has maximum Choi rank since the Choi representation of E is just the identity function divided by the rank. This example is not unexpected since for each input of R the possible outputs span the entire space V evenly, so one does not have any information about the output from any particular input except that we know that the output could be anything. This example shows that the channels that locally minimize the loss function L(R,E) are the channels that give us a sort of linear regression of R but where this linear regression takes into consideration uncertainty in the output so the regression of a output of a state is a mixed state rather than a pure state.
Here is an example of what might happen. Suppose that for each uj, we select a orthonormal basis ej,1,…,ej,s of unit vectors for V. Let R={(uj,ej,k):1≤j≤n,1≤k≤s}. Then
Then for each quantum channel E, by the concavity of the logarithm function (which is the arithmetic-geometric mean inequality), we have
L(R,E)=∑nj=1∑nk=1−log(E(uju∗j)ej,k,ej,k⟩)
≤∑nj=1−log(∑nk=1⟨E(uju∗j)ej,k,ej,k⟩)
=∑nj=1−log(Tr(E)). Here, equality is reached if and only if
E(uju∗j)ej,k,ej,k⟩=E(uju∗j)ej,l,ej,l⟩ for each j,k,l, but this equality can be achieved by the channel
defined by E(X)=Tr(X)⋅I/s which is known as the completely depolarizing channel. This is the channel that always takes a quantum state and returns the completely mixed state. On the other hand, the channel E has maximum Choi rank since the Choi representation of E is just the identity function divided by the rank. This example is not unexpected since for each input of R the possible outputs span the entire space V evenly, so one does not have any information about the output from any particular input except that we know that the output could be anything. This example shows that the channels that locally minimize the loss function L(R,E) are the channels that give us a sort of linear regression of R but where this linear regression takes into consideration uncertainty in the output so the regression of a output of a state is a mixed state rather than a pure state.