Actually, I have a little more to say:
Another way to think about higher-rank density matrices is as probability distributions over pure states; I think this is what Charlie Steiner’s comment is alluding to.
So, the rank-2 matrix from my previous comment, can be thought of as
, i.e., an equal probability of observing each of . And, because for any orthonormal vectors , again there’s nothing special about using the standard basis here (this is mathematically equivalent to the argument I made in the above comment about why you can use any basis for your measurement).
I always hated this point of view; it felt really hacky, and I always found it ugly and unmotivated to go from states to projections just for the sake of taking probability distributions.
The thing above about entanglement and decoherence, IMO, is a more elegant and natural way to see why you’d come up with this formalism. To be explicit, suppose you have the state , and there is an environment state that you don’t have access to, say it also begins in state , and initially everything is unentangled, so we begin in the state . Then some unitary evolution happens that entangles us, say it takes to the Bell state .
As we’ve seen, you should think of your state as being , and now it’s clear why this is the right framework for probabilistic mixtures of quantum states: it’s entirely natural to think of your part of the now-entangled system to be “an equal chance of and ”, and this indeed gives us the right density matrix. It also immediately implies that you are forced to also allow that it could be represented as “an equal chance of and ” where , and etc.
But it makes it clear why we have this non-uniqueness of representation, or where the missing information went: we don’t just “have a probabilistic mixture of quantum states”, we have a small part of a big quantum system that we can’t see all of, so the best we can do is represent it (non-uniquely) as a probabilistic mixture of quantum states.
Now, you aren’t obliged to take this view, that the only reason we have any uncertainty about our quantum state is because of this sort of decoherence process, but it’s definitely a powerful idea.
You can switch back and forth between the two views, obviously, and sometimes you do, but I think the most natural reason is because the operators you get are trace 1 positive semidefinite matrices, and there’s a lot of theory on PSD matrices waiting for you. Also, the natural maps on density matrices, the quantum channels or trace preserving completely positive maps have a pretty nice representation in terms of conjugation when you think of density matrices as matrices: \rho \mapsto \sum_i K_i \rho K_i^* for some operators K_i that satisfy \sum_i K_i^*K_i = I
Obviously all of these translates to the (0,2) tensor view, but a lot of theory was already built for thinking of these as linear maps on matrix spaces (or c* algebras or whatever fancier generalizations mathematicians had already been looking at)