Nope. I think they wouldn’t make much difference—at the sparsity loss coefficient I was using, I had ~0% dead neurons (and iirc the ghost gradients only kick in if you’ve been dead for a while). However, it is on the list of things to try to see if it changes the results.
Did you use ghost gradients? (gradients that tend to reactivate features that are at zero)
Nope. I think they wouldn’t make much difference—at the sparsity loss coefficient I was using, I had ~0% dead neurons (and iirc the ghost gradients only kick in if you’ve been dead for a while). However, it is on the list of things to try to see if it changes the results.