Does anyone know if double decent happens when you look at the posterior predictive rather than just the output of SGD? I wouldn’t be too surprised if it does, but before we start talking about the bayesian perspective, I’d like to see evidence that this isn’t just an artifact of using optimization instead of integration.
Does anyone know if double decent happens when you look at the posterior predictive rather than just the output of SGD? I wouldn’t be too surprised if it does, but before we start talking about the bayesian perspective, I’d like to see evidence that this isn’t just an artifact of using optimization instead of integration.