Ahhhh, yes, thank you that hits the nail on the head!
So I guess my original question has been answered, and I’m left with a new question about whether the analogy to solomonoff induction might be useful here: Simpler functions get more weight in the universal prior because there are more programs that compute them; perhaps simpler functions get more weight in neural network’s implicit prior because there are more parameter-settings that compute them (i.e. bigger region of parameter-space) and maybe both of these facts are true for the same reason.
Ahhhh, yes, thank you that hits the nail on the head!
So I guess my original question has been answered, and I’m left with a new question about whether the analogy to solomonoff induction might be useful here: Simpler functions get more weight in the universal prior because there are more programs that compute them; perhaps simpler functions get more weight in neural network’s implicit prior because there are more parameter-settings that compute them (i.e. bigger region of parameter-space) and maybe both of these facts are true for the same reason.