Particularly, https://arxiv.org/abs/2003.02139 takes a look a double descent from the perspective where they argue that parameters are a bad proxy of model complexity/capacity. Rather, effective dimensionality is what we should be plotting against and double descent effectively vanishes (https://arxiv.org/abs/2002.08791) when we use Bayesian model averaging instead of point estimates.
I want to point out some recent work by Andrew Gordon Wilson’s group—https://cims.nyu.edu/~andrewgw/#papers.
Particularly, https://arxiv.org/abs/2003.02139 takes a look a double descent from the perspective where they argue that parameters are a bad proxy of model complexity/capacity. Rather, effective dimensionality is what we should be plotting against and double descent effectively vanishes (https://arxiv.org/abs/2002.08791) when we use Bayesian model averaging instead of point estimates.