I’m having trouble parsing your first sentence—are you saying that yes, pretty much any neural net architecture will have a simplicity bias, but saying also that the biases will be importantly different depending on specifically which architecture you pick?
I think I would have predicted that SGD vs. random search would have a pretty minimal effect on inductive biases. My pet theory for why neural nets have a bias towards simplicity is that there are more ways for neural nets to encode simple functions than complex functions, i.e. larger regions of parameter space for simple functions. As the OP argues (I think) if this is right, then it makes sense that SGD and random search don’t affect the bias that much, since larger regions of parameter space will also have larger basins of attraction for SGD to roll down. (As for the justification of my pet theory, well, this is really sketchy but see my top-level comment below)
I’m having trouble parsing your first sentence—are you saying that yes, pretty much any neural net architecture will have a simplicity bias, but saying also that the biases will be importantly different depending on specifically which architecture you pick?
I think I would have predicted that SGD vs. random search would have a pretty minimal effect on inductive biases. My pet theory for why neural nets have a bias towards simplicity is that there are more ways for neural nets to encode simple functions than complex functions, i.e. larger regions of parameter space for simple functions. As the OP argues (I think) if this is right, then it makes sense that SGD and random search don’t affect the bias that much, since larger regions of parameter space will also have larger basins of attraction for SGD to roll down. (As for the justification of my pet theory, well, this is really sketchy but see my top-level comment below)