Freshman’s dream sparsity loss
A similar regularizer is known as Hoyer-Square.
Pick a value for k and a small ϵ≥0. Then define the activation function Tk,ϵ in the following way. Given a vector x, let b be the value of the kth-largest entry in x. Then define the vector Tk,ϵ(x) by
Is a in the following formula a typo?
Oh, yeah, looks like with p=2 this is equivalent to Hoyer-Square. Thanks for pointing that out; I didn’t know this had been studied previously.
And you’re right, that was a typo, and I’ve fixed it now. Thank you for mentioning that!
A similar regularizer is known as Hoyer-Square.
Is a in the following formula a typo?
Oh, yeah, looks like with p=2 this is equivalent to Hoyer-Square. Thanks for pointing that out; I didn’t know this had been studied previously.
And you’re right, that was a typo, and I’ve fixed it now. Thank you for mentioning that!