J Bostock comments on Regularization Causes Modularity Causes Generalization

J Bostock 3 Jan 2022 18:19 UTC
5 points
This is a great analysis of different causes of modularity. One thought I have is that L1/L2 and pruning seem similar to one another on the surface, but very different to dropout, and all of those seem very different to goal-varying.
If penalizing the total strength of connections during training is sufficient to enforce modularity, could it be the case that dropout is actually just penalizing connections? (e.g. as the effect of a non-firing neuron is propagated to fewer downstream neurons)
I can’t immediately see a reason why a goal-varying scheme could penalize connections but I wonder if this is in fact just another way of enforcing the same process.
- dkirmani 3 Jan 2022 19:01 UTC
  1 point
  Parent
  Thanks :)
  
  One thought I have is that L1/L2 and pruning seem similar to one another on the surface, but very different to dropout, and all of those seem very different to goal-varying.
  
  Agreed. Didn’t really get into pruning much because some papers only do weight pruning after training, which isn’t really the same thing as pruning during training, and I don’t want to conflate the two.
  
  Could it be the case that dropout is actually just penalizing connections? (e.g. as the effect of a non-firing neuron is propagated to fewer downstream neurons)
  
  Could very well be, I called this post ‘exploratory’ for a reason. However, you could make the case that dropout has the opposite effect based on the same reasoning. If upstream dropout penalizes downstream performance, why don’t downstream neurons form more connections to upstream neurons in order to hedge against dropout of a particular critical neuron?
  
  I can’t immediately see a reason why a goal-varying scheme could penalize connections but I wonder if this is in fact just another way of enforcing the same process.
  
  Oh damn, I meant to write more about goal-varying but forgot to. I should post something about that later. For now, though, here are my rough thoughts on the matter:
  
  I don’t think goal-varying directly imposes connection costs. Goal-varying selects for adaptability (aka generalization ability) because it constantly makes the model adapt to related goals. Since modularity causes generalization, selecting for generalization selects for modularity.