I don’t see why you dismiss multi-skeletal models. If your NN has a ton of unused neurons, why can’t you just deform those unused neurons to an entirely different model, them smoothly turn the other model on while turning the original off? Sure, you’d rarely learn such an intermediate state with SGD, but so what? The loss ridge is allowed to be narrower sometimes.
Thanks so much for your insightful comment, Charlie! I really appreciate it.
I think you totally could do this. Even if it is rare, it can occur with positive probability.
For example, my model of how natural selection (genetic algorithms, not SGD) consistently creates diversity is that with sufficiently many draws of descendents, one of the drawn descendents could have turned off the original model and turned on another model in a way that comprises a neutral drift.
I don’t see why you dismiss multi-skeletal models. If your NN has a ton of unused neurons, why can’t you just deform those unused neurons to an entirely different model, them smoothly turn the other model on while turning the original off? Sure, you’d rarely learn such an intermediate state with SGD, but so what? The loss ridge is allowed to be narrower sometimes.
Thanks so much for your insightful comment, Charlie! I really appreciate it.
I think you totally could do this. Even if it is rare, it can occur with positive probability.
For example, my model of how natural selection (genetic algorithms, not SGD) consistently creates diversity is that with sufficiently many draws of descendents, one of the drawn descendents could have turned off the original model and turned on another model in a way that comprises a neutral drift.