delton137 comments on How I’m thinking about GPT-N

delton137 17 Jan 2022 23:27 UTC
1 point
By the way, if you look at Filan et al.’s paper “Clusterability in Neural Networks” there is a lot of variance in their results but generally speaking they find that L1 regularization leads to slightly more clusterability than L2 or dropout.