This update to Evan’s <@double descent post@>(@Understanding “Deep Double Descent”@) explains why he thinks double descent is important. Specifically, Evan argues that it shows that inductive biases matter even for large, deep models. In particular, double descent shows that larger models are _simpler_ than smaller models, at least in the overparameterized setting where models are past the interpolation threshold where they can get approximately zero training error. This makes the case for <@mesa optimization@>(@Risks from Learned Optimization in Advanced Machine Learning Systems@) stronger, since mesa optimizers are simple, compressed policies.
Planned opinion:
As you might have gathered last week, I’m not sold on double descent as a clear, always-present phenomenon, though it certainly is a real effect that occurs in at least some situations. So I tend not to believe counterintuitive conclusions like “larger models are simpler” that are premised on double descent.
Regardless, I expect that powerful AI systems are going to be severely underparameterized, and so I don’t think it really matters that past the interpolation threshold larger models are simpler. I don’t think the case for mesa optimization should depend on this; humans are certainly “underparameterized”, but should count as mesa optimizers.
Planned summary for the Alignment newsletter:
Planned opinion: