Conjecture: SGD is mathematically equivalent to the Price equation prediction of the effect of natural selection on a population with particular simple but artificial properties. In particular, for any space of parameters and loss function on the parameter space, we can define a fitness landscape and a few other parameters so that the predictions match the Price equation.
I think it would be cool for someone to write this in a LessWrong post. The similarity between the Price equation and the SGD equation is pretty blatant, so I suspect that (if I’m right about this) someone else has written this down before. But I haven’t actually seen it written up.
Conjecture: SGD is mathematically equivalent to the Price equation prediction of the effect of natural selection on a population with particular simple but artificial properties. In particular, for any space of parameters and loss function on the parameter space, we can define a fitness landscape and a few other parameters so that the predictions match the Price equation.
I think it would be cool for someone to write this in a LessWrong post. The similarity between the Price equation and the SGD equation is pretty blatant, so I suspect that (if I’m right about this) someone else has written this down before. But I haven’t actually seen it written up.
An attempt was made last year, as an outgrowth of some assorted shard theory discussion, but I don’t think it got super far:
Price’s equation for neural networks
I think this is related, although not exactly the Price equation https://www.lesswrong.com/posts/5XbBm6gkuSdMJy9DT/conditions-for-mathematical-equivalence-of-stochastic