Ivan Vendrov comments on Seriously, what goes wrong with “reward the agent when it makes you smile”?

Ivan Vendrov 13 Aug 2022 4:23 UTC
7 points
1
Thinking about this more, I think gradient descent (at least in the modern regime) probably doesn’t select for inner search processes, because it’s not actually biased towards low Kolmogorov complexity. More in my standalone post, and here’s a John Maxwell comment making a similar point.