Edoardo Pona comments on Dropout can create a privileged basis in the ReLU output model.

Edoardo Pona 17 May 2023 12:46 UTC
1 point
0
Cool post! I have done some similar work, and my hypothesis for why dropout may inhibit superposition is that spreading out feature representations across multiple neurons exposes them to a higher chance of being perturbed. If we have a feature represented across $n$ neurons, with dropout $p$ , the chance that the feature does not get perturbed at all is $(1 - p)^{n}$ .
- lewis smith 21 May 2023 11:14 UTC
  1 point
  0
  Parent
  I think that dropout inhibits superposition is pretty intuitive; there’s a sense in which superposition seems like a ‘delicate’ phenomenon and adding random noise obviously increases the noise floor of what the model can represent. It might be possible to make some more quantitative predictions about this in the ReLU output model which would be cool, though maybe not that important relative to doing more on the effect of dropout/superposition in more realistic models.