Yes, that was the point. At least at first blush, this line of argument looks like it’s showing the opposite of what it purports to, so maybe it isn’t that great of an explanation.
On a separate note, I think the math I referenced above can now be updated to say: broadness is dependent on the number of orthogonalfeatures a network has, and how large the norm of these features is. Where both feature orthogonality and norm are defined by the L2 Hilbert space norm, which you may know from quantum mechanics.
This neatly encapsulates, extends, and quantifies the “information loss” notion in Vivek’s linked post above. It also sounds a lot like it’s formalising intuitions about broadness being connected to “generality”, “simplicity”, and lack of “fine tuning”.
It also makes me suspect that the orthogonal feature basis is the fundamentally correct way to think about computations in neural networks.
Post on this incoming once I figure out how to explain it to people who haven’t used Hilbert space before.
Yes, that was the point. At least at first blush, this line of argument looks like it’s showing the opposite of what it purports to, so maybe it isn’t that great of an explanation.
On a separate note, I think the math I referenced above can now be updated to say: broadness is dependent on the number of orthogonal features a network has, and how large the norm of these features is. Where both feature orthogonality and norm are defined by the L2 Hilbert space norm, which you may know from quantum mechanics.
This neatly encapsulates, extends, and quantifies the “information loss” notion in Vivek’s linked post above. It also sounds a lot like it’s formalising intuitions about broadness being connected to “generality”, “simplicity”, and lack of “fine tuning”.
It also makes me suspect that the orthogonal feature basis is the fundamentally correct way to think about computations in neural networks.
Post on this incoming once I figure out how to explain it to people who haven’t used Hilbert space before.