Yeah, that passage doesn’t effectively communicate what I was getting at. (Edit: I modified the post so that it now actually introduces the relevant quote instead of dumping it directly into the reader’s visual field.) I was gesturing at the quote from Design Principles of Biological Circuits that says that if you evolve an initially modular network towards a fixed goal (without dropout/regularization), the network sacrifices its existing modularity to eke out a bit more performance. I was also trying to convey that the dropout rate sets the specialization/redundancy tradeoff.
So yeah, a lack of dropout would lead to “lots and lots of modules, each focused on some very narrow task”, if it wasn’t for the fact that not having dropout would also blur the boundaries between those modules by allowing the optimizer to make more connections that break modularity and increase fitness. Not having dropout would allow more of these connections because there would be no pressure for redundancy, which means less pressure for modularity. I hope that’s a more competent explanation of the point I was trying to make.
Thank you!
Yeah, that passage doesn’t effectively communicate what I was getting at. (Edit: I modified the post so that it now actually introduces the relevant quote instead of dumping it directly into the reader’s visual field.) I was gesturing at the quote from Design Principles of Biological Circuits that says that if you evolve an initially modular network towards a fixed goal (without dropout/regularization), the network sacrifices its existing modularity to eke out a bit more performance. I was also trying to convey that the dropout rate sets the specialization/redundancy tradeoff.
So yeah, a lack of dropout would lead to “lots and lots of modules, each focused on some very narrow task”, if it wasn’t for the fact that not having dropout would also blur the boundaries between those modules by allowing the optimizer to make more connections that break modularity and increase fitness. Not having dropout would allow more of these connections because there would be no pressure for redundancy, which means less pressure for modularity. I hope that’s a more competent explanation of the point I was trying to make.