My guess is that if the OP is right, pretty much any neural net architecture will have a simplicity bias.
In my opinion, I think it’s certainly and obviously true that neural net architecture is a huge contributor to inductive biases and comes with a strong simplicity bias. What’s surprising to me is that I would expect a similar thing to be true of SGD and yet these results seem to indicate that SGD vs. random search has only a pretty minimal effect on inductive biases.
I’m having trouble parsing your first sentence—are you saying that yes, pretty much any neural net architecture will have a simplicity bias, but saying also that the biases will be importantly different depending on specifically which architecture you pick?
I think I would have predicted that SGD vs. random search would have a pretty minimal effect on inductive biases. My pet theory for why neural nets have a bias towards simplicity is that there are more ways for neural nets to encode simple functions than complex functions, i.e. larger regions of parameter space for simple functions. As the OP argues (I think) if this is right, then it makes sense that SGD and random search don’t affect the bias that much, since larger regions of parameter space will also have larger basins of attraction for SGD to roll down. (As for the justification of my pet theory, well, this is really sketchy but see my top-level comment below)
In my opinion, I think it’s certainly and obviously true that neural net architecture is a huge contributor to inductive biases and comes with a strong simplicity bias. What’s surprising to me is that I would expect a similar thing to be true of SGD and yet these results seem to indicate that SGD vs. random search has only a pretty minimal effect on inductive biases.
I’m having trouble parsing your first sentence—are you saying that yes, pretty much any neural net architecture will have a simplicity bias, but saying also that the biases will be importantly different depending on specifically which architecture you pick?
I think I would have predicted that SGD vs. random search would have a pretty minimal effect on inductive biases. My pet theory for why neural nets have a bias towards simplicity is that there are more ways for neural nets to encode simple functions than complex functions, i.e. larger regions of parameter space for simple functions. As the OP argues (I think) if this is right, then it makes sense that SGD and random search don’t affect the bias that much, since larger regions of parameter space will also have larger basins of attraction for SGD to roll down. (As for the justification of my pet theory, well, this is really sketchy but see my top-level comment below)