Nice. I tried to do something similar (except making everything leaky with polynomial tails, so
y = (y+torch.sqrt(y**2+scale**2)) * (1+(y+threshold)/torch.sqrt((y+threshold)**2+scale**2)) / 4
where the first part (y+torch.sqrt(y**2+scale**2)) is a softplus, and the second part (1+(y+threshold)/torch.sqrt((y+threshold)**2+scale**2)) is a leaky cutoff at the value threshold.
But I don’t think I got such clearly better results, so I’m going to have to read more thoroughly to see what else you were doing that I wasn’t :)
Nice. I tried to do something similar (except making everything leaky with polynomial tails, so
y = (y+torch.sqrt(y**2+scale**2)) * (1+(y+threshold)/torch.sqrt((y+threshold)**2+scale**2)) / 4
where the first part (y+torch.sqrt(y**2+scale**2)) is a softplus, and the second part (1+(y+threshold)/torch.sqrt((y+threshold)**2+scale**2)) is a leaky cutoff at the value threshold.
But I don’t think I got such clearly better results, so I’m going to have to read more thoroughly to see what else you were doing that I wasn’t :)