Thanks for your interest in our post and your questions!
Correct me if I’m wrong, but it struck while reading this that you can think of a neural network as learning two things at once…
That seems right!
Can the functions and classes be decoupled? … Could you come up with some other scheme for choosing between a whole bunch of different linear transformations?
It seems possible to come up with other schemes that do this; it just doesn’t seem easy to come up with something that is competitive with neural nets. If I recall correctly, there’s work in previous decades (which I’m struggling to find right now, although it’s easy to find similar more modern work e.g. https://pubmed.ncbi.nlm.nih.gov/23272922/ ) that builds a nonlinear dynamical system using N linear regions centred on N points. This work models a dynamical system, but there’s no reason we can’t just use the same principles for purely feedforward networks. The dynamics of the system are defined by whichever point the current state is closest to. The linear regions can have whatever dynamics you want. But then you’d have to store and look up N matrices, which isn’t great when N is large!
How much of the power of neural networks comes from their ability to learn to classify something into exponentially many different classes vs from the linear transformations that each class implements? Does this question even make sense?
I guess this depends on what you mean by ‘power’.
The 2^N different linear transformations can’t be arbitrary. As mentioned in the post, there’s the constraint that neighboring polytopes implement very similar transformations, because their weight matrices vary by just one row. What would be a reasonable way to measure this degree of constrained-ness?
I’m not sure! On the one hand, we could measure the dissimilarity of the transformations as the Frobenius norm (i.e. distance in matrix-space) of the difference matrix between linearized transformations on both sides of a polytope boundary. On the other hand, this difference can be arbitrarily large if the weights of our model are unbounded, because crossing some polytope boundaries might mean that a neuron with arbitrarily large weights turns on or off.
Thanks for your interest in our post and your questions!
That seems right!
It seems possible to come up with other schemes that do this; it just doesn’t seem easy to come up with something that is competitive with neural nets. If I recall correctly, there’s work in previous decades (which I’m struggling to find right now, although it’s easy to find similar more modern work e.g. https://pubmed.ncbi.nlm.nih.gov/23272922/ ) that builds a nonlinear dynamical system using N linear regions centred on N points. This work models a dynamical system, but there’s no reason we can’t just use the same principles for purely feedforward networks. The dynamics of the system are defined by whichever point the current state is closest to. The linear regions can have whatever dynamics you want. But then you’d have to store and look up N matrices, which isn’t great when N is large!
I guess this depends on what you mean by ‘power’.
I’m not sure! On the one hand, we could measure the dissimilarity of the transformations as the Frobenius norm (i.e. distance in matrix-space) of the difference matrix between linearized transformations on both sides of a polytope boundary. On the other hand, this difference can be arbitrarily large if the weights of our model are unbounded, because crossing some polytope boundaries might mean that a neuron with arbitrarily large weights turns on or off.
Thanks for the answers!