A related thing that’s special about the L2 norm is that there’s a bilinear form <⋅,⋅>:V×V→R such that |v| carries the same information as <v,v>.
“Ok, so what? Can’t do you the same thing with any integer n, with an n-linear form?” you might reasonably ask. First of all, not quite, it only works for the even integers, because otherwise you need to use absolute value*, which isn’t linear.
But the bilinear forms really are the special ones, roughly speaking because they are a similar type of object to linear transformations. By currying, a bilinear form on V is a linear map V→V∗, where V∗ is the space of linear maps V→R. Now the condition of a linear transformation preserving a bilinear form can just be written in terms of chaining linear maps together. A linear map f:V→W has an adjoint f∗:W∗→V∗ given by f∗(φ)(v)=φ(f(v)) for φ:W→R, and a linear map f:V→V preserves a bilinear form B:V→V∗ iff f∗∘B∘f=B. When using coordinates in an orthonormal basis, the bilinear form is represented by the identity matrix, so if f is represented by the matrix A, this becomes A∗IA=I, which is where the usual definition A∗A=I of an orthogonal matrix comes from. For quadrilinear forms etc, you can’t really do anything like this. So it’s L2 for which you get a way of characterizing “norm-preserving” in a nice clean linear-algebraic-in-character way, so it makes sense that that would be the one to have a different space of norm-preserving maps than the others.
I also subtly brushed past something that makes L2 a particularly special norm, although I guess it’s not clear if it helps. A nondegenerate bilinear form is the same thing as an isomorphism between V and V∗. If <v,v> is always positive, then taking its square root gives you a norm, and that norm is L2 (though it may be disguised if you weren’t using an orthonormal basis); and if it isn’t always positive, then you don’t get a norm out of it at all. So L2 is unique among all possible norms in that it induces and comes from an identification between your vector space and its dual.
*This assumes your vector space is over R for simplicity. If it’s over C, then you can’t get multilinearity no matter what you do, and the way this argument has to go is that you can get close enough by taking the complex conjugate of exactly half of the inputs, and then you get multilinearity from there. Speaking of C, this reminds me that I was inappropriately assuming your vector space was over R in my previous comment. Over C, you can multiply basis vectors by any scalar of absolute value 1, not just +1 and −1. This is broader that the norm-preserving changes of basis you can do over R to exactly the extent explicable by the fact that you’re sneaking in a little bit of L2 via the definition of the absolute value of a complex number.
Thanks! I expect I can stare at this and figure something out about why there is no reasonable notion of “triality” in Vect (ie, no 3-way analog of vector space duality—and, like, obviously that’s a little ridiculous, but also there’s definitely still something I haven’t understood about the special-ness of the dual space).
ETA: Also, I’m curious what you think the connection is between the “L2 is connected to bilinear forms” and “L2 is the only Lp metric invariant under nontrivial change of basis”, if it’s easy to state.
FWIW, I’m mostly reading these arguments as being variations on “if you put anything else than a 2 there, your life sucks”, and I believe that, but I still have a sense that the explanation I’m looking for is more about how putting a 2 there is positively natural, not just the best of a bad lot. That said, I’m loving these arguments, and I expect I can mine them for some of the intuition-corrections I seek :-)
I was just thinking back to this, and it occurred to me that one possible reason to be unsatisfied with the arguments I presented here is that I started off with this notion of a crossing-over point as p continuously increases. But then when you asked “ok, but why is the crossing-over point 2?”, I was like “uh, consider that it might be an integer, and then do a bunch of very discrete-looking arguments that end up showing there’s something special about 2″, which doesn’t connect very well with the “crossover point when p continuously varies” picture. If indeed this seemed unsatisfying to you, then perhaps you’ll like this more:
If we have a norm on a vector space, then it induces a norm on its dual space, given by |φ|:=max|v|=1|φ(v)|. If a linear map preserves a norm, then its adjoint preserves the induced norm on the dual space.
Claim: The Lp norm on column vectors induces, as its dual, the Lq norm on row vectors, where p and q satisfy 1p+1q=1.
Thus if a matrix preserves Lp norm, then its adjoint preserves Lq norm. When p=2, we get that its adjoint preserves the same norm. This sort of gives you a natural way of seeing 2 as halfway between 1 and infinity, and giving, for every p, a corresponding q that is equally far away from the middle in the other direction, in the appropriate sense.
Proof of claim: Given p and q such that 1p+1q=1, and a row vector φ=(φ1,...,φn) with Lq norm 1, let xi=|φi|q, so that x1+...+xn=1. Then let vi:=±x1/pi (with the same sign as φi). The column vector v=(v1,...,vn)T has Lp norm 1. φv=φ1v1+...+φnvn=x1p+1q1+...+x1p+1qn=1. This shows that the dual-Lp norm of φ is at least 1. Standard constrained optimization techniques will verify that this v maximizes φv subject to the constraint that v has Lp norm 1, and thus that the dual-Lp norm of φ is exactly 1.
Corollary: If a matrix preserves Lp norm for any p≠2, then it is a permutation matrix (up to flipping the signs of some of its entries).
Proof: Let q be such that 1p+1q=1. The columns of the matrix each have Lp norm 1, so the whole matrix has Lp norm n1/p (since the entries from each of the n columns contribute 1 to the sum). By the same reasoning about its adjoint, the matrix has Lq norm n1/q. Assume wlog p<q. Lq norm is ≤ Lp norm for q>p, with equality only on scalar multiples of basis vectors. So if any column of the matrix isn’t a basis vector (up to sign), then its Lq norm is less than 1; meanwhile, all the columns have Lq norm at most 1, so this would mean that the Lq norm of the whole matrix is strictly less than n1/q, contradicting the argument about its adjoint.
Also, I’m curious what you think the connection is between the “L2 is connected to bilinear forms” and “L2 is the only Lp metric invariant under nontrivial change of basis”, if it’s easy to state.
This was what I was trying to vaguely gesture towards with the derivation of the “transpose = inverse” characterization of L2-preserving matrices; the idea was that the argument was a natural sort of thing to try, so if it works to get us a characterization of the Lp-preserving matrices for exactly one value of p, then that’s probably the one that has a different space of Lp-preserving matrices than the rest. But perhaps this is too sketchy and mysterian. Let’s try a dimension-counting argument.
Linear transformations Rn→Rn and bilinear forms Rn×Rn→R can both be represented with n×n matrices. Linear transformations act on the space of bilinear forms by applying the linear transformation to both inputs before plugging them into the bilinear form. If the matrix A represents a linear transformation and the matrix B represents a bilinear form, then the matrix representing the bilinear form you get from this action is ATBA. But whatever, the point is, so far we have an n2-dimensional group acting on an n2-dimensional space. But quadratic forms (like the square of the L2 norm) can be represented by symmetricn×n matrices, the space of which is (n+12)-dimensional, and if B is symmetric, then so is ATBA. So now we have an n2-dimensional group acting on a (n+12)-dimensional space, so the stabilizer of any given element must be at least n2−(n+12)=(n2) dimensional. As it turns out, this is exactly the dimensionality of the space of orthogonal matrices, but the important thing is that this is nonzero, which explains why the space of orthogonal matrices must not be discrete.
Now let’s see what happens if we try to adapt this argument to Lp and p-linear forms for some p≠2.
With p=1, a linear transformation preserving a linear functional corresponds to a matrix A preserving a row vector φ in the sense that φA=φ. You can do a dimension-counting argument and find that there are tons of these matrices for any given row vector, but it doesn’t do you any good because 1 isn’t even so preserving the linear functional doesn’t mean you preserve L1 norm.
Let’s try p=4, then. A 4-linear form Rn×Rn×Rn×Rn→R can be represented by an n×n×n×n hypermatrix, the space of which is n4-dimensional. Again, we can restrict attention to the symmetric ones, which are preserved by the action of linear maps. But the space of symmetric n×n×n×n hypermatrices is (n+34)-dimensional, still much more than n2. This means that our linear maps can use up all of their degrees of freedom moving a symmetric 4-linear form around to different 4-linear forms without even getting close to filling up the whole space, and never gets forced to use its surplus degrees of freedom with linear maps that stabilize a 4-linear form, so it doesn’t give us linear maps stabilizing L4 norm.
A related thing that’s special about the L2 norm is that there’s a bilinear form <⋅,⋅>:V×V→R such that |v| carries the same information as <v,v>.
“Ok, so what? Can’t do you the same thing with any integer n, with an n-linear form?” you might reasonably ask. First of all, not quite, it only works for the even integers, because otherwise you need to use absolute value*, which isn’t linear.
But the bilinear forms really are the special ones, roughly speaking because they are a similar type of object to linear transformations. By currying, a bilinear form on V is a linear map V→V∗, where V∗ is the space of linear maps V→R. Now the condition of a linear transformation preserving a bilinear form can just be written in terms of chaining linear maps together. A linear map f:V→W has an adjoint f∗:W∗→V∗ given by f∗(φ)(v)=φ(f(v)) for φ:W→R, and a linear map f:V→V preserves a bilinear form B:V→V∗ iff f∗∘B∘f=B. When using coordinates in an orthonormal basis, the bilinear form is represented by the identity matrix, so if f is represented by the matrix A, this becomes A∗IA=I, which is where the usual definition A∗A=I of an orthogonal matrix comes from. For quadrilinear forms etc, you can’t really do anything like this. So it’s L2 for which you get a way of characterizing “norm-preserving” in a nice clean linear-algebraic-in-character way, so it makes sense that that would be the one to have a different space of norm-preserving maps than the others.
I also subtly brushed past something that makes L2 a particularly special norm, although I guess it’s not clear if it helps. A nondegenerate bilinear form is the same thing as an isomorphism between V and V∗. If <v,v> is always positive, then taking its square root gives you a norm, and that norm is L2 (though it may be disguised if you weren’t using an orthonormal basis); and if it isn’t always positive, then you don’t get a norm out of it at all. So L2 is unique among all possible norms in that it induces and comes from an identification between your vector space and its dual.
*This assumes your vector space is over R for simplicity. If it’s over C, then you can’t get multilinearity no matter what you do, and the way this argument has to go is that you can get close enough by taking the complex conjugate of exactly half of the inputs, and then you get multilinearity from there. Speaking of C, this reminds me that I was inappropriately assuming your vector space was over R in my previous comment. Over C, you can multiply basis vectors by any scalar of absolute value 1, not just +1 and −1. This is broader that the norm-preserving changes of basis you can do over R to exactly the extent explicable by the fact that you’re sneaking in a little bit of L2 via the definition of the absolute value of a complex number.
Thanks! I expect I can stare at this and figure something out about why there is no reasonable notion of “triality” in Vect (ie, no 3-way analog of vector space duality—and, like, obviously that’s a little ridiculous, but also there’s definitely still something I haven’t understood about the special-ness of the dual space).
ETA: Also, I’m curious what you think the connection is between the “L2 is connected to bilinear forms” and “L2 is the only Lp metric invariant under nontrivial change of basis”, if it’s easy to state.
FWIW, I’m mostly reading these arguments as being variations on “if you put anything else than a 2 there, your life sucks”, and I believe that, but I still have a sense that the explanation I’m looking for is more about how putting a 2 there is positively natural, not just the best of a bad lot. That said, I’m loving these arguments, and I expect I can mine them for some of the intuition-corrections I seek :-)
I was just thinking back to this, and it occurred to me that one possible reason to be unsatisfied with the arguments I presented here is that I started off with this notion of a crossing-over point as p continuously increases. But then when you asked “ok, but why is the crossing-over point 2?”, I was like “uh, consider that it might be an integer, and then do a bunch of very discrete-looking arguments that end up showing there’s something special about 2″, which doesn’t connect very well with the “crossover point when p continuously varies” picture. If indeed this seemed unsatisfying to you, then perhaps you’ll like this more:
If we have a norm on a vector space, then it induces a norm on its dual space, given by |φ|:=max|v|=1|φ(v)|. If a linear map preserves a norm, then its adjoint preserves the induced norm on the dual space.
Claim: The Lp norm on column vectors induces, as its dual, the Lq norm on row vectors, where p and q satisfy 1p+1q=1.
Thus if a matrix preserves Lp norm, then its adjoint preserves Lq norm. When p=2, we get that its adjoint preserves the same norm. This sort of gives you a natural way of seeing 2 as halfway between 1 and infinity, and giving, for every p, a corresponding q that is equally far away from the middle in the other direction, in the appropriate sense.
Proof of claim: Given p and q such that 1p+1q=1, and a row vector φ=(φ1,...,φn) with Lq norm 1, let xi=|φi|q, so that x1+...+xn=1. Then let vi:=±x1/pi (with the same sign as φi). The column vector v=(v1,...,vn)T has Lp norm 1. φv=φ1v1+...+φnvn=x1p+1q1+...+x1p+1qn=1. This shows that the dual-Lp norm of φ is at least 1. Standard constrained optimization techniques will verify that this v maximizes φv subject to the constraint that v has Lp norm 1, and thus that the dual-Lp norm of φ is exactly 1.
Corollary: If a matrix preserves Lp norm for any p≠2, then it is a permutation matrix (up to flipping the signs of some of its entries).
Proof: Let q be such that 1p+1q=1. The columns of the matrix each have Lp norm 1, so the whole matrix has Lp norm n1/p (since the entries from each of the n columns contribute 1 to the sum). By the same reasoning about its adjoint, the matrix has Lq norm n1/q. Assume wlog p<q. Lq norm is ≤ Lp norm for q>p, with equality only on scalar multiples of basis vectors. So if any column of the matrix isn’t a basis vector (up to sign), then its Lq norm is less than 1; meanwhile, all the columns have Lq norm at most 1, so this would mean that the Lq norm of the whole matrix is strictly less than n1/q, contradicting the argument about its adjoint.
This was what I was trying to vaguely gesture towards with the derivation of the “transpose = inverse” characterization of L2-preserving matrices; the idea was that the argument was a natural sort of thing to try, so if it works to get us a characterization of the Lp-preserving matrices for exactly one value of p, then that’s probably the one that has a different space of Lp-preserving matrices than the rest. But perhaps this is too sketchy and mysterian. Let’s try a dimension-counting argument.
Linear transformations Rn→Rn and bilinear forms Rn×Rn→R can both be represented with n×n matrices. Linear transformations act on the space of bilinear forms by applying the linear transformation to both inputs before plugging them into the bilinear form. If the matrix A represents a linear transformation and the matrix B represents a bilinear form, then the matrix representing the bilinear form you get from this action is ATBA. But whatever, the point is, so far we have an n2-dimensional group acting on an n2-dimensional space. But quadratic forms (like the square of the L2 norm) can be represented by symmetric n×n matrices, the space of which is (n+12)-dimensional, and if B is symmetric, then so is ATBA. So now we have an n2-dimensional group acting on a (n+12)-dimensional space, so the stabilizer of any given element must be at least n2−(n+12)=(n2) dimensional. As it turns out, this is exactly the dimensionality of the space of orthogonal matrices, but the important thing is that this is nonzero, which explains why the space of orthogonal matrices must not be discrete.
Now let’s see what happens if we try to adapt this argument to Lp and p-linear forms for some p≠2.
With p=1, a linear transformation preserving a linear functional corresponds to a matrix A preserving a row vector φ in the sense that φA=φ. You can do a dimension-counting argument and find that there are tons of these matrices for any given row vector, but it doesn’t do you any good because 1 isn’t even so preserving the linear functional doesn’t mean you preserve L1 norm.
Let’s try p=4, then. A 4-linear form Rn×Rn×Rn×Rn→R can be represented by an n×n×n×n hypermatrix, the space of which is n4-dimensional. Again, we can restrict attention to the symmetric ones, which are preserved by the action of linear maps. But the space of symmetric n×n×n×n hypermatrices is (n+34)-dimensional, still much more than n2. This means that our linear maps can use up all of their degrees of freedom moving a symmetric 4-linear form around to different 4-linear forms without even getting close to filling up the whole space, and never gets forced to use its surplus degrees of freedom with linear maps that stabilize a 4-linear form, so it doesn’t give us linear maps stabilizing L4 norm.