Thinking about this more, I think gradient descent (at least in the modern regime) probably doesn’t select for inner search processes, because it’s not actually biased towards low Kolmogorov complexity. More in my standalone post, and here’s a John Maxwell comment making a similar point.
Thinking about this more, I think gradient descent (at least in the modern regime) probably doesn’t select for inner search processes, because it’s not actually biased towards low Kolmogorov complexity. More in my standalone post, and here’s a John Maxwell comment making a similar point.