AlexMennen comments on A Primer on Matrix Calculus, Part 2: Jacobians and other fun

AlexMennen 18 Aug 2019 4:58 UTC
5 points
What this means is for the Jacobian is that the determinant tells us how much space is being squished or expanded in the neighborhood around a point. If the output space is being expanded a lot at some input point, then this means that the neural network is a bit unstable at that region, since minor alterations in the input could cause huge distortions in the output. By contrast, if the determinant is small, then some small change to the input will hardly make a difference to the output.
This isn’t quite true; the determinant being small is consistent with small changes in input making arbitrarily large changes in output, just so long as small changes in input in a different direction make sufficiently small changes in output.
The frobenius norm is nothing complicated, and is really just a way of describing that we square all of the elements in the matrix, take the sum, and then take the square root of this sum.
An alternative definition of the frobenius norm better highlights its connection to the motivation of regularizing the Jacobian frobenius in terms of limiting the extent to which small changes in input can cause large changes in output: The frobenius norm of a matrix J is the root-mean-square of |J(x)| over all unit vectors x.
- Matthew Barnett 18 Aug 2019 5:57 UTC
  1 point
  Parent
  This isn’t quite true; the determinant being small is consistent with small changes in input making arbitrarily large changes in output, just so long as small changes in input in a different direction make sufficiently small changes in output.
  Hmm, good point. I suppose why that’s not why we’re minimizing determinant, but rather frobenius norm. Hence:
  An alternative definition of the frobenius norm better highlights its connection to the motivation of regularizing the Jacobian frobenius
  Makes sense.
  - AlexMennen 19 Aug 2019 16:44 UTC
    3 points
    Parent
    I suppose why that’s not why we’re minimizing determinant, but rather frobenius norm.
    Yes, although another reason is that the determinant is only defined if the input and output spaces have the same dimension, which they typically don’t.