The determinant is a rather bizarre concept when one first encounters it. Given a square n×n matrix A, it’s defined as
detA=∑σ∈Sn(−1)σn∏i=1Aiσ(i)
where Sn is the symmetric group on n elements—the collection of all permutations of a set of n distinct objects—and (−1)σ is the sign of the permutation σ. Aij represents the entry of the matrix A on the ith row and jth column.
This definition means that the determinant is a sum that has one term for each permutation σ of the columns of the matrix A, and these terms are the product of n entries of the matrix up to a funny sign (−1)σ depending on the specific permutation σ.
To understand this definition, it’s essential that we first understand what the sign of a permtuation is.
Signs
The sign corresponds to the parity of the number of pairs of elements whose order is reversed by a permutation: if this number is even then we say the permutation is even and of sign 1, and if it’s odd we say the permutation is odd and of sign −1.
Let’s do some simple examples to see how this works, letting n=3:
The identity permutation reverses the order of exactly zero pairs, and zero is even. Therefore the identity is even, or its sign is positive.
The permutation (12), which swaps 1 and 2 but leaves 3 invariant, changes the ordering of only one pair: the pair (1,2). So this permutation is odd.
The permutation (123) sends 1→2,2→3,3→1. This permutation changes the ordering of (2,3) and (1,3) but leaves (1,2) invariant, so it reorders exactly two pairs. In other words, it’s even.
Another way to think about the sign is as follows: any permutation of a finite set can be obtained by repeatedly swapping two elements of the set with each other. For example, we can get the permutation (123) defined above by first swapping 2,3 and then swapping 1,2, i.e. (123)=(12)(23). This expression is of course not unique, since we also have (123)=(13)(13)(12)(23), for example: swapping 1,3 two times in a row just gets us back to where we started. However, it turns out given a specific permutation, the parity of the number of pair swaps (or transpositions) we need to perform to obtain it is well defined. In our case, for instance, while we can produce (123) out of two or four transpositions, we can’t produce it using exactly three.
So that’s where the sign means. Knowing this, we can work out the determinant in some explicit cases. For instance, the determinant of a two-by-two matrix is A11A22−A12A21.
This expression still doesn’t look like it should mean anything, so we need to further justify it. The key property of the determinant is that it commutes with matrix multiplication: for two matrices A,B, we have det(AB)=det(A)det(B). If we want to multiply two square matrices and then take their determinant, it doesn’t matter which order we do the operations in: we can take the determinants first and then multiply or the other way around. We’ll get the same answer in the end.
This is a desirable property because matrices are big and complicated objects while numbers are comparatively simpler. The determinant gives us a way to reduce some questions about matrices to questions about numbers, which can be much easier to answer. It does this by throwing away a lot of information about the matrix, but that’s not necessarily a problem depending on what we want to do.
So we want to find a map det which has the property we just mentioned: it commutes with matrix multiplication. Furthermore, we should ask it to not be a trivial map: it shouldn’t just send all matrices to 0 or 1, since a constant function doesn’t give us any information about anything. How might we go about finding such a map?
Finding the determinant
Let’s first look at a special class of matrices. We know that a vector space Rn, for example, has an obvious basis consisting of the vectors
e1=(1,0,0,…,0)e2=(0,1,0,…,0)…en=(0,0,…,0,1)
We can form a subgroup of n×n matrices closed under multiplication by just looking at matrices where the columns are a permutation of these vectors. Now, we see a connection with the sign of a permutation: it’s the only nontrivial way we know (and in fact it’s the only way to do it at all!) to assign a scalar value to a permutation in a way that commutes with composition, which in this special case we know the determinant must do. Therefore, we can tentatively define
det(eσ(1),eσ(2),…,eσ(n))=(−1)σ
where the notation represents that this is the determinant of a matrix with columns eσ(1),eσ(2),… respectively. This tells us the determinant has something to do with signs of permutations. In fact, combined with the fact that the determinant commutes with matrix multiplication, by multiplying any matrix A with rows A1,A2,…,An by a permutation matrix (eσ(1),eσ(2),…,eσ(n)), we can deduce
det(Aσ(1),Aσ(2),…,Aσ(n))=(−1)σdet(A)
In other words, if seen as a map defined on the rows (or columns) of a matrix, the determinant is alternating: swapping two rows or columns changes the sign of its value. We can infer from this that if a matrix has two identical rows or columns its determinant will be zero.
Here we seem to be stuck again: the problem is the assumptions we’ve made so far are too weak. We can look to make them stronger while making the determinant have more and more nice properties in the process.
The nicest property we could ask for is that det is linear as a function of matrices, that is, det(A+B)=det(A)+det(B). However, this condition is much too strong. For example, if we take the 3×3 identity matrix and apply the permutation (123) to its columns twice to get a total of three matrices, it’s easy to see that all of them are of determinant 1 but their sum is a matrix with all entries equal to 1 and so must have determinant 0. In other words, if we ask det to commute with addition as well as multiplication, we can’t continue our construction.
What is weaker than this that we could hope to ask for? Well, we’ve already seen that the determinant can be seen as a function of the columns of a matrix, so perhaps instead of being linear in the whole matrix it’s linear just in the columns and rows. In other words,
and since the determinant is alternating this will generalize to all other columns as well.
It may not seem like it, but we already have enough information now to determine det uniquely. Indeed, this is because any n×n matrix A can be expressed as
(n∑i=1Ai1ei,n∑i=1Ai2ei,…,n∑i=1Ainei)
If this is not clear to you, simply imagine we decompose each column into a linear combination of the elementary basis vectors ei.
We know the determinant is linear, and we know what values the determinant takes on permutation matrices, so we can actually evaluate the determinant of this directly using the properties we’ve postulated so far. We simply “expand out” the map using linearity to get
where the outside sum runs over all functions from {1,2,…,n} to itself. Now all we have to do is to plug in the values of determinants we’ve already figured out. If f is not a permutation, in other words if it takes two identical values, then the determinant will be zero. So we can assume f is actually a permutation, in which case we already know that the determinant must equal the sign of f. In other words,
detA=∑σ∈Sn(−1)σn∏i=1Aiσ(i)
This is obviously alternating and multilinear, and we can show that any such map must actually commute with matrix multiplication, so we’ve found the map we were looking for.
Takeaways
The way I went about finding the determinant above might not be intuitive. At first glance, it looks counterproductive to add further assumptions about the map when we only want it to commute with matrix multiplication. However, there are two reasons why this is a good strategy:
If it works, the map we end up with is much more regular and well behaved than what we would’ve obtained if we picked an arbitrary map just satisfiyng our original condition. In our context, we clearly want det to be as well behaved as possible, so this is only beneficial to us.
Narrowing the search space down to maps we understand better is a good start in any search process. Either it succeeds, in which case the additional assumptions will have only made things easier for us; or it fails, in which case we learn a useful fact that a map having some collection of properties is actually impossible. Failure can give us some insight into how we might want to weaken our conditions in order to be successful the next time around.
Whence the determinant?
The determinant is a rather bizarre concept when one first encounters it. Given a square n×n matrix A, it’s defined as
detA=∑σ∈Sn(−1)σn∏i=1Aiσ(i)
where Sn is the symmetric group on n elements—the collection of all permutations of a set of n distinct objects—and (−1)σ is the sign of the permutation σ. Aij represents the entry of the matrix A on the ith row and jth column.
This definition means that the determinant is a sum that has one term for each permutation σ of the columns of the matrix A, and these terms are the product of n entries of the matrix up to a funny sign (−1)σ depending on the specific permutation σ.
To understand this definition, it’s essential that we first understand what the sign of a permtuation is.
Signs
The sign corresponds to the parity of the number of pairs of elements whose order is reversed by a permutation: if this number is even then we say the permutation is even and of sign 1, and if it’s odd we say the permutation is odd and of sign −1.
Let’s do some simple examples to see how this works, letting n=3:
The identity permutation reverses the order of exactly zero pairs, and zero is even. Therefore the identity is even, or its sign is positive.
The permutation (12), which swaps 1 and 2 but leaves 3 invariant, changes the ordering of only one pair: the pair (1,2). So this permutation is odd.
The permutation (123) sends 1→2,2→3,3→1. This permutation changes the ordering of (2,3) and (1,3) but leaves (1,2) invariant, so it reorders exactly two pairs. In other words, it’s even.
Another way to think about the sign is as follows: any permutation of a finite set can be obtained by repeatedly swapping two elements of the set with each other. For example, we can get the permutation (123) defined above by first swapping 2,3 and then swapping 1,2, i.e. (123)=(12)(23). This expression is of course not unique, since we also have (123)=(13)(13)(12)(23), for example: swapping 1,3 two times in a row just gets us back to where we started. However, it turns out given a specific permutation, the parity of the number of pair swaps (or transpositions) we need to perform to obtain it is well defined. In our case, for instance, while we can produce (123) out of two or four transpositions, we can’t produce it using exactly three.
So that’s where the sign means. Knowing this, we can work out the determinant in some explicit cases. For instance, the determinant of a two-by-two matrix is A11A22−A12A21.
This expression still doesn’t look like it should mean anything, so we need to further justify it. The key property of the determinant is that it commutes with matrix multiplication: for two matrices A,B, we have det(AB)=det(A)det(B). If we want to multiply two square matrices and then take their determinant, it doesn’t matter which order we do the operations in: we can take the determinants first and then multiply or the other way around. We’ll get the same answer in the end.
This is a desirable property because matrices are big and complicated objects while numbers are comparatively simpler. The determinant gives us a way to reduce some questions about matrices to questions about numbers, which can be much easier to answer. It does this by throwing away a lot of information about the matrix, but that’s not necessarily a problem depending on what we want to do.
So we want to find a map det which has the property we just mentioned: it commutes with matrix multiplication. Furthermore, we should ask it to not be a trivial map: it shouldn’t just send all matrices to 0 or 1, since a constant function doesn’t give us any information about anything. How might we go about finding such a map?
Finding the determinant
Let’s first look at a special class of matrices. We know that a vector space Rn, for example, has an obvious basis consisting of the vectors
e1=(1,0,0,…,0)e2=(0,1,0,…,0)…en=(0,0,…,0,1)
We can form a subgroup of n×n matrices closed under multiplication by just looking at matrices where the columns are a permutation of these vectors. Now, we see a connection with the sign of a permutation: it’s the only nontrivial way we know (and in fact it’s the only way to do it at all!) to assign a scalar value to a permutation in a way that commutes with composition, which in this special case we know the determinant must do. Therefore, we can tentatively define
det(eσ(1),eσ(2),…,eσ(n))=(−1)σ
where the notation represents that this is the determinant of a matrix with columns eσ(1),eσ(2),… respectively. This tells us the determinant has something to do with signs of permutations. In fact, combined with the fact that the determinant commutes with matrix multiplication, by multiplying any matrix A with rows A1,A2,…,An by a permutation matrix (eσ(1),eσ(2),…,eσ(n)), we can deduce
det(Aσ(1),Aσ(2),…,Aσ(n))=(−1)σdet(A)
In other words, if seen as a map defined on the rows (or columns) of a matrix, the determinant is alternating: swapping two rows or columns changes the sign of its value. We can infer from this that if a matrix has two identical rows or columns its determinant will be zero.
Here we seem to be stuck again: the problem is the assumptions we’ve made so far are too weak. We can look to make them stronger while making the determinant have more and more nice properties in the process.
The nicest property we could ask for is that det is linear as a function of matrices, that is, det(A+B)=det(A)+det(B). However, this condition is much too strong. For example, if we take the 3×3 identity matrix and apply the permutation (123) to its columns twice to get a total of three matrices, it’s easy to see that all of them are of determinant 1 but their sum is a matrix with all entries equal to 1 and so must have determinant 0. In other words, if we ask det to commute with addition as well as multiplication, we can’t continue our construction.
What is weaker than this that we could hope to ask for? Well, we’ve already seen that the determinant can be seen as a function of the columns of a matrix, so perhaps instead of being linear in the whole matrix it’s linear just in the columns and rows. In other words,
det(r1A1+r2B1,A2,…,An)=r1det(A1,A2,…,An)+r2det(B1,A2,…,An)
and since the determinant is alternating this will generalize to all other columns as well.
It may not seem like it, but we already have enough information now to determine det uniquely. Indeed, this is because any n×n matrix A can be expressed as
(n∑i=1Ai1ei,n∑i=1Ai2ei,…,n∑i=1Ainei)
If this is not clear to you, simply imagine we decompose each column into a linear combination of the elementary basis vectors ei.
We know the determinant is linear, and we know what values the determinant takes on permutation matrices, so we can actually evaluate the determinant of this directly using the properties we’ve postulated so far. We simply “expand out” the map using linearity to get
det(n∑i=1Ai1ei,n∑i=1Ai2ei,…,n∑i=1Ainei)=∑f:n→ndet(ef(1),ef(2),…,ef(n))n∏i=1Aif(i)
where the outside sum runs over all functions from {1,2,…,n} to itself. Now all we have to do is to plug in the values of determinants we’ve already figured out. If f is not a permutation, in other words if it takes two identical values, then the determinant will be zero. So we can assume f is actually a permutation, in which case we already know that the determinant must equal the sign of f. In other words,
detA=∑σ∈Sn(−1)σn∏i=1Aiσ(i)
This is obviously alternating and multilinear, and we can show that any such map must actually commute with matrix multiplication, so we’ve found the map we were looking for.
Takeaways
The way I went about finding the determinant above might not be intuitive. At first glance, it looks counterproductive to add further assumptions about the map when we only want it to commute with matrix multiplication. However, there are two reasons why this is a good strategy:
If it works, the map we end up with is much more regular and well behaved than what we would’ve obtained if we picked an arbitrary map just satisfiyng our original condition. In our context, we clearly want det to be as well behaved as possible, so this is only beneficial to us.
Narrowing the search space down to maps we understand better is a good start in any search process. Either it succeeds, in which case the additional assumptions will have only made things easier for us; or it fails, in which case we learn a useful fact that a map having some collection of properties is actually impossible. Failure can give us some insight into how we might want to weaken our conditions in order to be successful the next time around.