I think that dropout inhibits superposition is pretty intuitive; there’s a sense in which superposition seems like a ‘delicate’ phenomenon and adding random noise obviously increases the noise floor of what the model can represent. It might be possible to make some more quantitative predictions about this in the ReLU output model which would be cool, though maybe not that important relative to doing more on the effect of dropout/superposition in more realistic models.
I think that dropout inhibits superposition is pretty intuitive; there’s a sense in which superposition seems like a ‘delicate’ phenomenon and adding random noise obviously increases the noise floor of what the model can represent. It might be possible to make some more quantitative predictions about this in the ReLU output model which would be cool, though maybe not that important relative to doing more on the effect of dropout/superposition in more realistic models.