Kaj_Sotala comments on Regularization Causes Modularity Causes Generalization

Kaj_Sotala 2 Jan 2022 7:23 UTC
11 points
Great post!
If you expect no failures at all, you should let modules be as specialized as possible in order to maximize performance.
- Do that, and your modules end up hyperspecialized and interdependent. The borders between different modules wither away; you no longer have functionally distinct modules to speak of. You have a spaghetti tower.
I’m a little confused by this bit, because intuitively it feels like hyperspecialization = hypermodularity? In that if a module is a computational unit that carries out a specific task, then increased specialization feels like it should lead to there being lots and lots of modules, each focused on some very narrow task?
- dkirmani 2 Jan 2022 7:49 UTC
  8 points
  Parent
  Thank you!
  
  Yeah, that passage doesn’t effectively communicate what I was getting at. (Edit: I modified the post so that it now actually introduces the relevant quote instead of dumping it directly into the reader’s visual field.) I was gesturing at the quote from Design Principles of Biological Circuits that says that if you evolve an initially modular network towards a fixed goal (without dropout/regularization), the network sacrifices its existing modularity to eke out a bit more performance. I was also trying to convey that the dropout rate sets the specialization/redundancy tradeoff.
  
  So yeah, a lack of dropout would lead to “lots and lots of modules, each focused on some very narrow task”, if it wasn’t for the fact that not having dropout would also blur the boundaries between those modules by allowing the optimizer to make more connections that break modularity and increase fitness. Not having dropout would allow more of these connections because there would be no pressure for redundancy, which means less pressure for modularity. I hope that’s a more competent explanation of the point I was trying to make.