There’s two steps to deeply understanding a complicated problem:
Get a handle on what is going on
Distill the idea down to the very core fundamentals
We’re going to talk about step (2) here. Having a deep, elegant understanding of a concept makes it much easier to effectively use without needing to memorize tons of details. It also often makes it easier to guide others towards an understanding of the topic, since you can focus on conveying the central ideas.
I’m currently taking a linear algebra class, so I decided to analyze several different approaches to describing the concept of the determinant and see what I could learn about distillation. If you aren’t already familiar with linear algebra, probably skim the math details.
Approaches
My college linear algebra lectures
When introducing the determinant, my linear algebra class simply provided a definition without any justification: a determinant is a function from square matrices to the reals which is:
Multilinear on rows
Alternating on rows
Normalized
We then discussed what exactly each of these conditions means and proved things that follow from them. If you don’t already understand determinants this looks like you’re just pulling criteria out of a hat.
We then went on to derive, from that definition, the properties that Jim Hefferon uncovers. The initial introduction to a concept being entirely arbitrary doesn’t mean you can’t learn things about the concept. We also eventually showed that the function which gives the volume of the image of a hypercube under a linear transformation satisfies the conditions for being a determinant function.
After explaining how to find the inverse of a matrix, Hefferon gives an exercise: show that (abcd)−1=1ad−bc(d−b−ca), and thus that a 2x2 matrix is invertible if and only if ad−bc≠0. The ad−bc term is identified as significant; it would be useful to have a generalization to higher dimension matrices.
A good strategy for solving problems is to explore which properties the solution must have, and then search for something with those properties. So we shall start by asking what properties we’d like the determinant formulas to have.
It’s been established earlier in the textbook that a matrix is invertible iff it does not have any zeroes in the diagonal after performing Gaussian elimination. So the determination might be a function that 1) isn’t impacted by performing Gaussian elimination and 2) is equal to the product of the diagonal (which would be zero if any entries on the diagonal are zero).
However, we find that the ad−bc function is, in fact, impacted by Gaussian elimination. If you swap two rows of a matrix (a step used repeatedly in Gaussian elimination), then it inverts the sign of ad−bc. If you look at (cdab) and take the product of the main diagonal minus the product of the other diagonal, you get cb−ad=−(ad−bc). Hefferon shrugs at this and moves on: we care about maintaining zeros, so let’s just say that determinants swap sign when you swap rows.
A similar problem happens with multiplying rows by a scalar: (kakbcd) gives kad−kbc=k(ad−bc). Again, we shrug and move on: determinants are multiplied by a scalar when you multiply the row by a scalar.
We now give the definition of a determinant: a determinant is a function that
Doesn’t change when you add a scalar multiple of a row to a different row (a step involved in Gaussian reduction)
Swaps sign when you swap two rows
Gets multiplied by a scalar when you multiply a row by a scalar (multilinear)
Is normalized
This is motivated a little better: we are not pulling properties out of literal thin air, but conjecturing them based on a term that showed up in a formula we discovered. Still, we do not actually get any sort of fundamental understanding of the determinant: we only understand its properties in light of Gaussian elimination.
3Blue1Brown describes the determinant as the factor by which a region’s volume changes after the application of a linear map. From this idea, it can be immediately visualized why the determinant of a diagonal matrix is the product of the entries on the diagonal. The association with matrix rank, and thus invertibility, is also intuitively clear.
The process of computing a determinant is set aside as inessential, though the ad−bc formula is mentioned.
This approach immediately gives an intuitive understanding of what the determinant represents and why we would care about it. It feels, to me, like aconcept that means something. I can rederive the properties by reasoning about the concept on its own.
However, this framing of the determinant is a little fuzzy when you discuss scalar fields other than the real numbers. It’s not really obvious to me how the notion of volume works in finite fields. I mostly expect that just naïvely extending the concept will work fine, but I’d need to think about it in depth to be very sure.
Erdil starts by giving the formula detA=∑σ∈Sn(−1)σAiσ(i), but mostly to make the point that it’s not intuitively clear, on a first encounter, why anyone would care about this bizarre formula which requires concepts like permutation parity which you might be entirely unfamiliar with.
After describing what this formula means, Erdil moves onto giving an actual justification for the determinant. According to Erdil, the key property of the determinant is that it is multiplicative: det(AB)=det(A)⋅det(B). Erdil then looks at the group of matrices given by permuting the columns of the identity matrix. He then claims:
Now, we see a connection with the sign of a permutation: it’s the only nontrivial way we know (and in fact it’s the only way to do it at all!)[1] to assign a scalar value to a permutation in a way that commutes with composition, which in this special case we know the determinant must do.
It’s not immediately obvious why we’re throwing out trivial assignments here: we could simply give all of these matrices the determinant 1. If we look back at Hefferon, we ran into a similar issue there; Hefferon’s justification was essentially ”ad−bc is the 2x2 determinant, and it’s alternating.” Which is helpful but not entirely satisfying.
I think the central issue here is that we want the determinant of (1100), and analogous matrices in higher dimensions, to be zero. This makes obvious sense under the volume interpretation; you mostly need the idea that the determinant of a singular matrix should be zero. This property rules out the “1” option.
Anyways, after establishing that the matrix should be alternating (on columns), Erdil considers other properties the determinant might have that would be useful. This is a pretty interesting approach: making a wish list of properties, and then hoping that some function has it.
It might be nice for the determinant to be a linear function, but that contradicts already established details. Instead we have it be (multi)linear on the columns of the matrix.
I like this a little better than the first two algebraic characterizations we saw: what we end up with is simpler, and we also passed through some of the same concepts along the way.
Analysis
First: of all these approaches, I think the “determinant as scaling factor for volume” is the most useful for my intuition by far. It makes sense that composing two linear maps will just scale the volume twice, since volume scaling is independent of basis. (In fact, this visualization of the determinant makes it obvious that the determinant is independent of basis—unlike other approaches.)
I think the main take-away is that metaphors are really really powerful. If you can visualize something in an effective way, then all the right intuition sometimes immediately follows.
Second: I find the det(AB)=det(A)⋅det(B) concept more clear and intuitive than the “alternating” or “row operations” ideas. It’s a little surprising how much more intuitive I find this, honestly. It might have something to do with the fact that multiplicativity is something which is fundamental to the idea of a linear map, a concept anyone studying linear algebra ought to be pretty familiar with already.
It’s also, like, one line instead of two lines? I’m not sure how much of a difference that actually makes. The Hefferon approach is sort of like “one line for each kind of row operation”, which is pretty easy to keep track of mentally.
The normalized / “not the zero function” constraints add pretty negligible mental overhead for me, so I don’t think those are worth worrying about.
Third: No one really talked about how the conception of volume relates to the other properties of the determinant very much. I think that’s probably the main thing about the determinant that I don’t have an especially good grasp on yet, though some parts are fairly obvious. Erdil said he’s considered discussing this in a future post.
I think the determinant is more mathematically fundamental than the concept of volume. It just seems the other way around because we use volumes in every day life.
I do think this is true, in a sense. Similarly, I’d say that the concept of an abelian group is more mathematically fundamental than the concept of addition.
But it’s still very intuitively useful, I think, to conceive of abelian groups as generalizations of addition, and the determinant as a “generalization” of volume. It’s intuitively much easier to work outwards from specific examples than to build from the ground up, for me.
Some people proposed thinking of determinants in terms of exterior algebra, which is a concept I’m not familiar enough with to comment on yet.
Conclusion / tl;dr
When you distill concepts, some possible goals are:
Use tight metaphors
Chunk component concepts
Provide specific examples in addition to more general points
I initially was a little confused by Erdil’s wording. I believe he does not mean “it’s the only way to do it at all, even including trivial ways”, but rather “there aren’t any other nontrivial ways.”
I think this statement might be false if you’re working in the complex numbers: 1 and −1 are special in the reals because they’re the only roots of unity, but there’s other options if you’re using complex numbers. Something to ponder later.
Distilling and approaches to the determinant
There’s two steps to deeply understanding a complicated problem:
Get a handle on what is going on
Distill the idea down to the very core fundamentals
We’re going to talk about step (2) here. Having a deep, elegant understanding of a concept makes it much easier to effectively use without needing to memorize tons of details. It also often makes it easier to guide others towards an understanding of the topic, since you can focus on conveying the central ideas.
I’m currently taking a linear algebra class, so I decided to analyze several different approaches to describing the concept of the determinant and see what I could learn about distillation. If you aren’t already familiar with linear algebra, probably skim the math details.
Approaches
My college linear algebra lectures
When introducing the determinant, my linear algebra class simply provided a definition without any justification: a determinant is a function from square matrices to the reals which is:
Multilinear on rows
Alternating on rows
Normalized
We then discussed what exactly each of these conditions means and proved things that follow from them. If you don’t already understand determinants this looks like you’re just pulling criteria out of a hat.
We then went on to derive, from that definition, the properties that Jim Hefferon uncovers. The initial introduction to a concept being entirely arbitrary doesn’t mean you can’t learn things about the concept. We also eventually showed that the function which gives the volume of the image of a hypercube under a linear transformation satisfies the conditions for being a determinant function.
Linear Algebra by Jim Hefferon
After explaining how to find the inverse of a matrix, Hefferon gives an exercise: show that (abcd)−1=1ad−bc(d−b−ca), and thus that a 2x2 matrix is invertible if and only if ad−bc≠0. The ad−bc term is identified as significant; it would be useful to have a generalization to higher dimension matrices.
It’s been established earlier in the textbook that a matrix is invertible iff it does not have any zeroes in the diagonal after performing Gaussian elimination. So the determination might be a function that 1) isn’t impacted by performing Gaussian elimination and 2) is equal to the product of the diagonal (which would be zero if any entries on the diagonal are zero).
However, we find that the ad−bc function is, in fact, impacted by Gaussian elimination. If you swap two rows of a matrix (a step used repeatedly in Gaussian elimination), then it inverts the sign of ad−bc. If you look at (cdab) and take the product of the main diagonal minus the product of the other diagonal, you get cb−ad=−(ad−bc). Hefferon shrugs at this and moves on: we care about maintaining zeros, so let’s just say that determinants swap sign when you swap rows.
A similar problem happens with multiplying rows by a scalar: (kakbcd) gives kad−kbc=k(ad−bc). Again, we shrug and move on: determinants are multiplied by a scalar when you multiply the row by a scalar.
We now give the definition of a determinant: a determinant is a function that
Doesn’t change when you add a scalar multiple of a row to a different row (a step involved in Gaussian reduction)
Swaps sign when you swap two rows
Gets multiplied by a scalar when you multiply a row by a scalar (multilinear)
Is normalized
This is motivated a little better: we are not pulling properties out of literal thin air, but conjecturing them based on a term that showed up in a formula we discovered. Still, we do not actually get any sort of fundamental understanding of the determinant: we only understand its properties in light of Gaussian elimination.
The Essence of Linear Algebra by 3Blue1Brown
3Blue1Brown describes the determinant as the factor by which a region’s volume changes after the application of a linear map. From this idea, it can be immediately visualized why the determinant of a diagonal matrix is the product of the entries on the diagonal. The association with matrix rank, and thus invertibility, is also intuitively clear.
The process of computing a determinant is set aside as inessential, though the ad−bc formula is mentioned.
This approach immediately gives an intuitive understanding of what the determinant represents and why we would care about it. It feels, to me, like a concept that means something. I can rederive the properties by reasoning about the concept on its own.
However, this framing of the determinant is a little fuzzy when you discuss scalar fields other than the real numbers. It’s not really obvious to me how the notion of volume works in finite fields. I mostly expect that just naïvely extending the concept will work fine, but I’d need to think about it in depth to be very sure.
Whence the determinant? by Ege Erdil
Erdil starts by giving the formula detA=∑σ∈Sn(−1)σAiσ(i), but mostly to make the point that it’s not intuitively clear, on a first encounter, why anyone would care about this bizarre formula which requires concepts like permutation parity which you might be entirely unfamiliar with.
After describing what this formula means, Erdil moves onto giving an actual justification for the determinant. According to Erdil, the key property of the determinant is that it is multiplicative: det(AB)=det(A)⋅det(B). Erdil then looks at the group of matrices given by permuting the columns of the identity matrix. He then claims:
It’s not immediately obvious why we’re throwing out trivial assignments here: we could simply give all of these matrices the determinant 1. If we look back at Hefferon, we ran into a similar issue there; Hefferon’s justification was essentially ”ad−bc is the 2x2 determinant, and it’s alternating.” Which is helpful but not entirely satisfying.
I think the central issue here is that we want the determinant of (1100), and analogous matrices in higher dimensions, to be zero. This makes obvious sense under the volume interpretation; you mostly need the idea that the determinant of a singular matrix should be zero. This property rules out the “1” option.
Anyways, after establishing that the matrix should be alternating (on columns), Erdil considers other properties the determinant might have that would be useful. This is a pretty interesting approach: making a wish list of properties, and then hoping that some function has it.
It might be nice for the determinant to be a linear function, but that contradicts already established details. Instead we have it be (multi)linear on the columns of the matrix.
So, in summary, the matrix is characterized by:
det(AB)=det(A)⋅det(B)
det is multilinear on columns
det isn’t just the constant zero function (from a comment by gjm)
I like this a little better than the first two algebraic characterizations we saw: what we end up with is simpler, and we also passed through some of the same concepts along the way.
Analysis
First: of all these approaches, I think the “determinant as scaling factor for volume” is the most useful for my intuition by far. It makes sense that composing two linear maps will just scale the volume twice, since volume scaling is independent of basis. (In fact, this visualization of the determinant makes it obvious that the determinant is independent of basis—unlike other approaches.)
I think the main take-away is that metaphors are really really powerful. If you can visualize something in an effective way, then all the right intuition sometimes immediately follows.
Second: I find the det(AB)=det(A)⋅det(B) concept more clear and intuitive than the “alternating” or “row operations” ideas. It’s a little surprising how much more intuitive I find this, honestly. It might have something to do with the fact that multiplicativity is something which is fundamental to the idea of a linear map, a concept anyone studying linear algebra ought to be pretty familiar with already.
It’s also, like, one line instead of two lines? I’m not sure how much of a difference that actually makes. The Hefferon approach is sort of like “one line for each kind of row operation”, which is pretty easy to keep track of mentally.
The normalized / “not the zero function” constraints add pretty negligible mental overhead for me, so I don’t think those are worth worrying about.
Third: No one really talked about how the conception of volume relates to the other properties of the determinant very much. I think that’s probably the main thing about the determinant that I don’t have an especially good grasp on yet, though some parts are fairly obvious. Erdil said he’s considered discussing this in a future post.
Some extra notes:
In the comments of Erdil’s post, Oscar Cunningham says:
I do think this is true, in a sense. Similarly, I’d say that the concept of an abelian group is more mathematically fundamental than the concept of addition.
But it’s still very intuitively useful, I think, to conceive of abelian groups as generalizations of addition, and the determinant as a “generalization” of volume. It’s intuitively much easier to work outwards from specific examples than to build from the ground up, for me.
Some people proposed thinking of determinants in terms of exterior algebra, which is a concept I’m not familiar enough with to comment on yet.
Conclusion / tl;dr
When you distill concepts, some possible goals are:
Use tight metaphors
Chunk component concepts
Provide specific examples in addition to more general points
Or, really succinctly: metaphors, chunking, examples.
I initially was a little confused by Erdil’s wording. I believe he does not mean “it’s the only way to do it at all, even including trivial ways”, but rather “there aren’t any other nontrivial ways.”
I think this statement might be false if you’re working in the complex numbers: 1 and −1 are special in the reals because they’re the only roots of unity, but there’s other options if you’re using complex numbers. Something to ponder later.