Lol, cool. I tried the “4 minute” challenge (without having read EY’s answer, but having read yours).
Hill-climbing search requires selecting on existing genetic variance on alleles already in the gene pool. If there isn’t a local mutation which changes the eventual fitness of the properties which that genotype unfolds into, then you won’t have selection pressure in that direction. On the other hand, gradient descent is updating live on a bunch of data in fast iterations which allow running modifications over the parameters themselves. It’s like being able to change a blueprint for a house, versus being able to be at the house in the day and direct repair-people.
The changes happen online, relative to the actual within-cognition goings-on of the agent (e.g. you see some cheese, go to the cheese, get a policy gradient and become more likely to do it again). Compare that to having to try out a bunch of existing tweaks to a cheese-bumping-into agent (e.g. make it learn faster early in life but then get sick and die later), where you can’t get detailed control over its responses to specific situations (you can just tweak the initial setup).
Gradient descent is just a fundamentally different operation. You aren’t selecting over learning processes which unfold into minds, trying out a finite but large gene pool of variants, and then choosing the most self-replicating; you are instead doing local parametric search over what changes outputs on the training data. But RL isn’t even differentiable, you aren’t running gradients through it directly. So there isn’t even an analogue of “training data” in the evolutionary regime.
I think I ended up optimizing for “actually get model onto the page in 4 minutes” and not for “explain in a way Scott would have understood.”
Lol, cool. I tried the “4 minute” challenge (without having read EY’s answer, but having read yours).
I think I ended up optimizing for “actually get model onto the page in 4 minutes” and not for “explain in a way Scott would have understood.”