Cool post! I have done some similar work, and my hypothesis for why dropout may inhibit superposition is that spreading out feature representations across multiple neurons exposes them to a higher chance of being perturbed. If we have a feature represented across n neurons, with dropout p, the chance that the feature does not get perturbed at all is (1−p)n.
I think that dropout inhibits superposition is pretty intuitive; there’s a sense in which superposition seems like a ‘delicate’ phenomenon and adding random noise obviously increases the noise floor of what the model can represent. It might be possible to make some more quantitative predictions about this in the ReLU output model which would be cool, though maybe not that important relative to doing more on the effect of dropout/superposition in more realistic models.
Cool post! I have done some similar work, and my hypothesis for why dropout may inhibit superposition is that spreading out feature representations across multiple neurons exposes them to a higher chance of being perturbed. If we have a feature represented across n neurons, with dropout p, the chance that the feature does not get perturbed at all is (1−p)n.
I think that dropout inhibits superposition is pretty intuitive; there’s a sense in which superposition seems like a ‘delicate’ phenomenon and adding random noise obviously increases the noise floor of what the model can represent. It might be possible to make some more quantitative predictions about this in the ReLU output model which would be cool, though maybe not that important relative to doing more on the effect of dropout/superposition in more realistic models.