I agree that they are related. In the context of this discussion, the critical difference between SGD and evolution is somewhat captured by your Assumption 1:
Fixed ‘fitness function’ or objective function mapping genome to continuous ‘fitness score’
Evolution does not directly select/optimize the content of minds. Evolution selects/optimizes genomes based (in part) on how they distally shape what minds learn and what minds do (to the extent that impacts reproduction), with even more indirection caused by selection’s heavy dependence on the environment. All of that creates a ton of optimization “slack”, such that large-brained human minds with language could steer optimiztion far faster & more decisively than natural selection could. This what 1a3orn was pointing to earlier with
evolution does not grow minds, it grows hyperparameters for minds. When you look at the actual process for how we actually start to like ice-cream—namely, we eat it, and then we get a reward, and that’s why we like it—then the world looks a a lot less hostile, and misalignment a lot less likely.
SGD does not have that slack by default. It acts directly on cognitive content (associations, reflexes, decision-weights), without slack or added indirection. If you control the training dataset/environment, you control what is rewarded and what is penalized, and if you are using SGD, then this lets you directly mold the circuits in the model’s “brain” as desired. That is one of the main alignment-relevant intuitions that gets lost when blurring the evolution/SGD distinction.
Right. And in the context of these explainer videos, the particular evolution described has the properties which make it near-equivalent to SGD, I’d say?
SGD does not have that slack by default. It acts directly on cognitive content (associations, reflexes, decision-weights), without slack or added indirection. If you control the training dataset/environment, you control what is rewarded and what is penalized, and if you are using SGD, then this lets you directly mold the circuits in the model’s “brain” as desired.
Hmmm, this strikes me as much too strong (especially ‘this lets you directly mold the circuits’).
Remember also that with RLHF, we’re learning a reward model which is something like the more-hardcoded bits of brain-stuff, which is in turn providing updates to the actually-acting artefact, which is something like the more-flexibly-learned bits of brain-stuff.
I also think there’s a fair alternative analogy to be drawn like
evolution of genome (including mostly-hard-coded brain-stuff) ~ SGD (perhaps +PBT) of NN weights
within-lifetime-learning of organism ~ in-context something-something of NN
(this is one analogy I commonly drew before RLHF came along.)
So, look, the analogies are loose, but they aren’t baseless.
I agree that they are related. In the context of this discussion, the critical difference between SGD and evolution is somewhat captured by your Assumption 1:
Evolution does not directly select/optimize the content of minds. Evolution selects/optimizes genomes based (in part) on how they distally shape what minds learn and what minds do (to the extent that impacts reproduction), with even more indirection caused by selection’s heavy dependence on the environment. All of that creates a ton of optimization “slack”, such that large-brained human minds with language could steer optimiztion far faster & more decisively than natural selection could. This what 1a3orn was pointing to earlier with
SGD does not have that slack by default. It acts directly on cognitive content (associations, reflexes, decision-weights), without slack or added indirection. If you control the training dataset/environment, you control what is rewarded and what is penalized, and if you are using SGD, then this lets you directly mold the circuits in the model’s “brain” as desired. That is one of the main alignment-relevant intuitions that gets lost when blurring the evolution/SGD distinction.
Right. And in the context of these explainer videos, the particular evolution described has the properties which make it near-equivalent to SGD, I’d say?
Hmmm, this strikes me as much too strong (especially ‘this lets you directly mold the circuits’).
Remember also that with RLHF, we’re learning a reward model which is something like the more-hardcoded bits of brain-stuff, which is in turn providing updates to the actually-acting artefact, which is something like the more-flexibly-learned bits of brain-stuff.
I also think there’s a fair alternative analogy to be drawn like
evolution of genome (including mostly-hard-coded brain-stuff) ~ SGD (perhaps +PBT) of NN weights
within-lifetime-learning of organism ~ in-context something-something of NN
(this is one analogy I commonly drew before RLHF came along.)
So, look, the analogies are loose, but they aren’t baseless.