That is closer to what I meant, but it isn’t quite what SLT says. The architecture doesn’t need to be biased toward the target function’s complexity. It just needs to always prefer simpler fits to more complex ones.
This why the neural redshift paper says something different to SLT. It says neural nets that generalize well don’t just have a simplicity bias, they have a bias for functions with similar complexity to the target function. This brings into question mesaoptimization, because although mesaoptimization is favored by a simplicity bias, it is not necessarily favored by a bias toward equivalent simplicity to the target function.
This why the neural redshift paper says something different to SLT. It says neural nets that generalize well don’t just have a simplicity bias, they have a bias for functions with similar complexity to the target function. This brings into question mesaoptimization, because although mesaoptimization is favored by a simplicity bias, it is not necessarily favored by a bias toward equivalent simplicity to the target function.