Shouldn’t this create strong regularisation favouring using meaningful directions over meaningful polytopes?
Yes, that seems reasonable!
One thing we want to emphasize is that it’s perfectly possible to have both meaningful directions and meaningful polytopes. For instance, if all polytope boudaries intersect the origin, then all polytopes will be unbounded. In that case, polytopes will essentially be directions!
The polytope lens only becomes relevant when trying to explain what perfectly linear models can’t account for. Although LN might create a bias toward directions, each layer is still nonlinear; nonlinearities probably still need to be accouted for somewhere in our explanations.
All this said, we haven’t thought a lot about LN in this context. It’d be great to know if this regularisation is real and if it’s strong enough that we can reason about networks without thinking about polytopes.
The polytope lens only becomes relevant when trying to explain what perfectly linear models can’t account for. Although LN might create a bias toward directions, each layer is still nonlinear; nonlinearities probably still need to be accouted for somewhere in our explanations.
Re this, this somewhat conflicts with my understand of the direction lens. The point is not that things are perfectly linear. This point is that we can interpret directions after a non-linear activation function. The non-linearities are used between interpretable spaces to do some transformation mapping meaningful directions to new meaningful directions (and the exact details of how it does this are the circuits to interpret). See, eg, my modular addition work for a very concrete example of this.
It’s mathematically true that any operation of a ReLU network will be manipulating polytopes (including a randomly initialised network!), and I understood the key claim of this post is that the polytope lens more naturally maps onto interpreting the network and figuring out what’s going on.
A linear function can never do anything interesting to directions—it just transforms the available space, but cannot create new meaningful directions, just superpositions of the old ones.
Thanks for your interest!
Yes, that seems reasonable!
One thing we want to emphasize is that it’s perfectly possible to have both meaningful directions and meaningful polytopes. For instance, if all polytope boudaries intersect the origin, then all polytopes will be unbounded. In that case, polytopes will essentially be directions!
The polytope lens only becomes relevant when trying to explain what perfectly linear models can’t account for. Although LN might create a bias toward directions, each layer is still nonlinear; nonlinearities probably still need to be accouted for somewhere in our explanations.
All this said, we haven’t thought a lot about LN in this context. It’d be great to know if this regularisation is real and if it’s strong enough that we can reason about networks without thinking about polytopes.
Gotcha, thanks!
Re this, this somewhat conflicts with my understand of the direction lens. The point is not that things are perfectly linear. This point is that we can interpret directions after a non-linear activation function. The non-linearities are used between interpretable spaces to do some transformation mapping meaningful directions to new meaningful directions (and the exact details of how it does this are the circuits to interpret). See, eg, my modular addition work for a very concrete example of this.
It’s mathematically true that any operation of a ReLU network will be manipulating polytopes (including a randomly initialised network!), and I understood the key claim of this post is that the polytope lens more naturally maps onto interpreting the network and figuring out what’s going on.
A linear function can never do anything interesting to directions—it just transforms the available space, but cannot create new meaningful directions, just superpositions of the old ones.