Thanks! I expect I can stare at this and figure something out about why there is no reasonable notion of “triality” in Vect (ie, no 3-way analog of vector space duality—and, like, obviously that’s a little ridiculous, but also there’s definitely still something I haven’t understood about the special-ness of the dual space).
ETA: Also, I’m curious what you think the connection is between the “L2 is connected to bilinear forms” and “L2 is the only Lp metric invariant under nontrivial change of basis”, if it’s easy to state.
FWIW, I’m mostly reading these arguments as being variations on “if you put anything else than a 2 there, your life sucks”, and I believe that, but I still have a sense that the explanation I’m looking for is more about how putting a 2 there is positively natural, not just the best of a bad lot. That said, I’m loving these arguments, and I expect I can mine them for some of the intuition-corrections I seek :-)
I was just thinking back to this, and it occurred to me that one possible reason to be unsatisfied with the arguments I presented here is that I started off with this notion of a crossing-over point as p continuously increases. But then when you asked “ok, but why is the crossing-over point 2?”, I was like “uh, consider that it might be an integer, and then do a bunch of very discrete-looking arguments that end up showing there’s something special about 2″, which doesn’t connect very well with the “crossover point when p continuously varies” picture. If indeed this seemed unsatisfying to you, then perhaps you’ll like this more:
If we have a norm on a vector space, then it induces a norm on its dual space, given by |φ|:=max|v|=1|φ(v)|. If a linear map preserves a norm, then its adjoint preserves the induced norm on the dual space.
Claim: The Lp norm on column vectors induces, as its dual, the Lq norm on row vectors, where p and q satisfy 1p+1q=1.
Thus if a matrix preserves Lp norm, then its adjoint preserves Lq norm. When p=2, we get that its adjoint preserves the same norm. This sort of gives you a natural way of seeing 2 as halfway between 1 and infinity, and giving, for every p, a corresponding q that is equally far away from the middle in the other direction, in the appropriate sense.
Proof of claim: Given p and q such that 1p+1q=1, and a row vector φ=(φ1,...,φn) with Lq norm 1, let xi=|φi|q, so that x1+...+xn=1. Then let vi:=±x1/pi (with the same sign as φi). The column vector v=(v1,...,vn)T has Lp norm 1. φv=φ1v1+...+φnvn=x1p+1q1+...+x1p+1qn=1. This shows that the dual-Lp norm of φ is at least 1. Standard constrained optimization techniques will verify that this v maximizes φv subject to the constraint that v has Lp norm 1, and thus that the dual-Lp norm of φ is exactly 1.
Corollary: If a matrix preserves Lp norm for any p≠2, then it is a permutation matrix (up to flipping the signs of some of its entries).
Proof: Let q be such that 1p+1q=1. The columns of the matrix each have Lp norm 1, so the whole matrix has Lp norm n1/p (since the entries from each of the n columns contribute 1 to the sum). By the same reasoning about its adjoint, the matrix has Lq norm n1/q. Assume wlog p<q. Lq norm is ≤ Lp norm for q>p, with equality only on scalar multiples of basis vectors. So if any column of the matrix isn’t a basis vector (up to sign), then its Lq norm is less than 1; meanwhile, all the columns have Lq norm at most 1, so this would mean that the Lq norm of the whole matrix is strictly less than n1/q, contradicting the argument about its adjoint.
Also, I’m curious what you think the connection is between the “L2 is connected to bilinear forms” and “L2 is the only Lp metric invariant under nontrivial change of basis”, if it’s easy to state.
This was what I was trying to vaguely gesture towards with the derivation of the “transpose = inverse” characterization of L2-preserving matrices; the idea was that the argument was a natural sort of thing to try, so if it works to get us a characterization of the Lp-preserving matrices for exactly one value of p, then that’s probably the one that has a different space of Lp-preserving matrices than the rest. But perhaps this is too sketchy and mysterian. Let’s try a dimension-counting argument.
Linear transformations Rn→Rn and bilinear forms Rn×Rn→R can both be represented with n×n matrices. Linear transformations act on the space of bilinear forms by applying the linear transformation to both inputs before plugging them into the bilinear form. If the matrix A represents a linear transformation and the matrix B represents a bilinear form, then the matrix representing the bilinear form you get from this action is ATBA. But whatever, the point is, so far we have an n2-dimensional group acting on an n2-dimensional space. But quadratic forms (like the square of the L2 norm) can be represented by symmetricn×n matrices, the space of which is (n+12)-dimensional, and if B is symmetric, then so is ATBA. So now we have an n2-dimensional group acting on a (n+12)-dimensional space, so the stabilizer of any given element must be at least n2−(n+12)=(n2) dimensional. As it turns out, this is exactly the dimensionality of the space of orthogonal matrices, but the important thing is that this is nonzero, which explains why the space of orthogonal matrices must not be discrete.
Now let’s see what happens if we try to adapt this argument to Lp and p-linear forms for some p≠2.
With p=1, a linear transformation preserving a linear functional corresponds to a matrix A preserving a row vector φ in the sense that φA=φ. You can do a dimension-counting argument and find that there are tons of these matrices for any given row vector, but it doesn’t do you any good because 1 isn’t even so preserving the linear functional doesn’t mean you preserve L1 norm.
Let’s try p=4, then. A 4-linear form Rn×Rn×Rn×Rn→R can be represented by an n×n×n×n hypermatrix, the space of which is n4-dimensional. Again, we can restrict attention to the symmetric ones, which are preserved by the action of linear maps. But the space of symmetric n×n×n×n hypermatrices is (n+34)-dimensional, still much more than n2. This means that our linear maps can use up all of their degrees of freedom moving a symmetric 4-linear form around to different 4-linear forms without even getting close to filling up the whole space, and never gets forced to use its surplus degrees of freedom with linear maps that stabilize a 4-linear form, so it doesn’t give us linear maps stabilizing L4 norm.
Thanks! I expect I can stare at this and figure something out about why there is no reasonable notion of “triality” in Vect (ie, no 3-way analog of vector space duality—and, like, obviously that’s a little ridiculous, but also there’s definitely still something I haven’t understood about the special-ness of the dual space).
ETA: Also, I’m curious what you think the connection is between the “L2 is connected to bilinear forms” and “L2 is the only Lp metric invariant under nontrivial change of basis”, if it’s easy to state.
FWIW, I’m mostly reading these arguments as being variations on “if you put anything else than a 2 there, your life sucks”, and I believe that, but I still have a sense that the explanation I’m looking for is more about how putting a 2 there is positively natural, not just the best of a bad lot. That said, I’m loving these arguments, and I expect I can mine them for some of the intuition-corrections I seek :-)
I was just thinking back to this, and it occurred to me that one possible reason to be unsatisfied with the arguments I presented here is that I started off with this notion of a crossing-over point as p continuously increases. But then when you asked “ok, but why is the crossing-over point 2?”, I was like “uh, consider that it might be an integer, and then do a bunch of very discrete-looking arguments that end up showing there’s something special about 2″, which doesn’t connect very well with the “crossover point when p continuously varies” picture. If indeed this seemed unsatisfying to you, then perhaps you’ll like this more:
If we have a norm on a vector space, then it induces a norm on its dual space, given by |φ|:=max|v|=1|φ(v)|. If a linear map preserves a norm, then its adjoint preserves the induced norm on the dual space.
Claim: The Lp norm on column vectors induces, as its dual, the Lq norm on row vectors, where p and q satisfy 1p+1q=1.
Thus if a matrix preserves Lp norm, then its adjoint preserves Lq norm. When p=2, we get that its adjoint preserves the same norm. This sort of gives you a natural way of seeing 2 as halfway between 1 and infinity, and giving, for every p, a corresponding q that is equally far away from the middle in the other direction, in the appropriate sense.
Proof of claim: Given p and q such that 1p+1q=1, and a row vector φ=(φ1,...,φn) with Lq norm 1, let xi=|φi|q, so that x1+...+xn=1. Then let vi:=±x1/pi (with the same sign as φi). The column vector v=(v1,...,vn)T has Lp norm 1. φv=φ1v1+...+φnvn=x1p+1q1+...+x1p+1qn=1. This shows that the dual-Lp norm of φ is at least 1. Standard constrained optimization techniques will verify that this v maximizes φv subject to the constraint that v has Lp norm 1, and thus that the dual-Lp norm of φ is exactly 1.
Corollary: If a matrix preserves Lp norm for any p≠2, then it is a permutation matrix (up to flipping the signs of some of its entries).
Proof: Let q be such that 1p+1q=1. The columns of the matrix each have Lp norm 1, so the whole matrix has Lp norm n1/p (since the entries from each of the n columns contribute 1 to the sum). By the same reasoning about its adjoint, the matrix has Lq norm n1/q. Assume wlog p<q. Lq norm is ≤ Lp norm for q>p, with equality only on scalar multiples of basis vectors. So if any column of the matrix isn’t a basis vector (up to sign), then its Lq norm is less than 1; meanwhile, all the columns have Lq norm at most 1, so this would mean that the Lq norm of the whole matrix is strictly less than n1/q, contradicting the argument about its adjoint.
This was what I was trying to vaguely gesture towards with the derivation of the “transpose = inverse” characterization of L2-preserving matrices; the idea was that the argument was a natural sort of thing to try, so if it works to get us a characterization of the Lp-preserving matrices for exactly one value of p, then that’s probably the one that has a different space of Lp-preserving matrices than the rest. But perhaps this is too sketchy and mysterian. Let’s try a dimension-counting argument.
Linear transformations Rn→Rn and bilinear forms Rn×Rn→R can both be represented with n×n matrices. Linear transformations act on the space of bilinear forms by applying the linear transformation to both inputs before plugging them into the bilinear form. If the matrix A represents a linear transformation and the matrix B represents a bilinear form, then the matrix representing the bilinear form you get from this action is ATBA. But whatever, the point is, so far we have an n2-dimensional group acting on an n2-dimensional space. But quadratic forms (like the square of the L2 norm) can be represented by symmetric n×n matrices, the space of which is (n+12)-dimensional, and if B is symmetric, then so is ATBA. So now we have an n2-dimensional group acting on a (n+12)-dimensional space, so the stabilizer of any given element must be at least n2−(n+12)=(n2) dimensional. As it turns out, this is exactly the dimensionality of the space of orthogonal matrices, but the important thing is that this is nonzero, which explains why the space of orthogonal matrices must not be discrete.
Now let’s see what happens if we try to adapt this argument to Lp and p-linear forms for some p≠2.
With p=1, a linear transformation preserving a linear functional corresponds to a matrix A preserving a row vector φ in the sense that φA=φ. You can do a dimension-counting argument and find that there are tons of these matrices for any given row vector, but it doesn’t do you any good because 1 isn’t even so preserving the linear functional doesn’t mean you preserve L1 norm.
Let’s try p=4, then. A 4-linear form Rn×Rn×Rn×Rn→R can be represented by an n×n×n×n hypermatrix, the space of which is n4-dimensional. Again, we can restrict attention to the symmetric ones, which are preserved by the action of linear maps. But the space of symmetric n×n×n×n hypermatrices is (n+34)-dimensional, still much more than n2. This means that our linear maps can use up all of their degrees of freedom moving a symmetric 4-linear form around to different 4-linear forms without even getting close to filling up the whole space, and never gets forced to use its surplus degrees of freedom with linear maps that stabilize a 4-linear form, so it doesn’t give us linear maps stabilizing L4 norm.