I was confused until I realized that the “sparsity” that this post is referring to is activation sparsity not the more common weight sparsity that you get from L1 penalization of weights.
I was confused until I realized that the “sparsity” that this post is referring to is activation sparsity not the more common weight sparsity that you get from L1 penalization of weights.