I agree that the processes are different, but I think the analogy still holds well.
SGD doesn’t get to optimize directly over a conveniently factored out values module. It’s as blind to the details of how it gets results as evolution, since it can only care about which local twiddles get locally better results.
So it seems to me that SGD should basically build up a cognitive mess that doesn’t get refactored in nice ways when you do further training. Which looks a lot like evolution in the analogy.
Maybe there’s some evidence for this in the difficulty of retraining a language model to generate text in the middle, even though this is apparently easy to do if you train the model to do infilling from the get-go? https://arxiv.org/abs/2207.14255
(I also disagree about ancestral humans not having a concept or sense tracking “multitude of my descendants” / “power of my family” / etc. And indeed some of these are in my values.)
The key difference between evolution and SGD isn’t about locality or efficiency (though I disagree with your characterization of SGD / deep learning as inefficient or inelegant). The key difference is that human evolution involved a two-level optimization process, with evolution optimizing over the learning process + initial reward system of the brain, and the brain learning (optimizing) within lifetime.
Values form within lifetimes, and evolution does not operate on that scale. Thus, the mechanisms available to evolution for it to influence learned values are limited and roundabout.
Ancestral humans had concepts somewhat related to IGF, but they didn’t have IGF itself. That matters a lot for determining whether the sorts of learning process / reward circuit tweaks that evolution applied in the ancestral environment will lead to modern humans forming IGF values that generalize to situations such as maximally donating to sperm banks. Not-coincidentally, humans are more likely to value these ancestral environment accessible notions than IGF.
There’s also the further difficulty of aligning any RL-esque learning process to valuing IGF specifically: the long time horizons (relative to within lifetime learning) over which differences in IGF become apparent means any possible reward for increasing IGF will be very sparse and rarely influence an organism’s cognition. Additionally, learning to act coherently over longer time horizons is just generally difficult.
What you’re saying is that evolution optimized over changes to a kind of blueprint-for-a-human (DNA) that does not directly “do” anything like cognition with concepts and values, but which grows, through cell division and later through cognitive learning, into a human that does do things like cognition with concepts and values. This grown human then goes on to exhibit behavior and have an impact on the world. So there is an approximate two-stage thing happening:
(1) blueprint → (2) agent → (3) behavior
In contrast, when we optimize over policies in ML, we optimize directly at the level of a kind of cognition-machine (e.g. some neural net architecture) that directly acts in the world, and could, quite plausibly, have concepts and values.
So evolution optimizes at (1), whereas in today’s ML we optimize at (2) and there is nothing really corresponding to (1) in most of today’s ML.
That’s the key mechanistic difference between evolution and SGD. There’s an additional layer here that comes from how that mechanistic difference interacts with the circumstances of the ancestral environment (I.e., that ancestral humans never had an IGF abstraction), which means evolutionary optimization over the human mind blueprint in the ancestral environment would have never produced a blueprint that lead to value formation around IGF in the modern environment. This fully explains modern humanity’s misalignment wrt IGF, which would have happened even in worlds where inner alignment is never a problem for ML systems. Thus, evolutionary analogies tell us ~nothing about whether we should be worried about inner alignment.
(This is even ignoring the fact that IGF seems like a very hard concept to align minds to at all, due to the sparseness of IGF reward signals.)
I agree that the processes are different, but I think the analogy still holds well.
SGD doesn’t get to optimize directly over a conveniently factored out values module. It’s as blind to the details of how it gets results as evolution, since it can only care about which local twiddles get locally better results.
So it seems to me that SGD should basically build up a cognitive mess that doesn’t get refactored in nice ways when you do further training. Which looks a lot like evolution in the analogy.
Maybe there’s some evidence for this in the difficulty of retraining a language model to generate text in the middle, even though this is apparently easy to do if you train the model to do infilling from the get-go? https://arxiv.org/abs/2207.14255
(I also disagree about ancestral humans not having a concept or sense tracking “multitude of my descendants” / “power of my family” / etc. And indeed some of these are in my values.)
The key difference between evolution and SGD isn’t about locality or efficiency (though I disagree with your characterization of SGD / deep learning as inefficient or inelegant). The key difference is that human evolution involved a two-level optimization process, with evolution optimizing over the learning process + initial reward system of the brain, and the brain learning (optimizing) within lifetime.
Values form within lifetimes, and evolution does not operate on that scale. Thus, the mechanisms available to evolution for it to influence learned values are limited and roundabout.
Ancestral humans had concepts somewhat related to IGF, but they didn’t have IGF itself. That matters a lot for determining whether the sorts of learning process / reward circuit tweaks that evolution applied in the ancestral environment will lead to modern humans forming IGF values that generalize to situations such as maximally donating to sperm banks. Not-coincidentally, humans are more likely to value these ancestral environment accessible notions than IGF.
There’s also the further difficulty of aligning any RL-esque learning process to valuing IGF specifically: the long time horizons (relative to within lifetime learning) over which differences in IGF become apparent means any possible reward for increasing IGF will be very sparse and rarely influence an organism’s cognition. Additionally, learning to act coherently over longer time horizons is just generally difficult.
What you’re saying is that evolution optimized over changes to a kind of blueprint-for-a-human (DNA) that does not directly “do” anything like cognition with concepts and values, but which grows, through cell division and later through cognitive learning, into a human that does do things like cognition with concepts and values. This grown human then goes on to exhibit behavior and have an impact on the world. So there is an approximate two-stage thing happening:
(1) blueprint → (2) agent → (3) behavior
In contrast, when we optimize over policies in ML, we optimize directly at the level of a kind of cognition-machine (e.g. some neural net architecture) that directly acts in the world, and could, quite plausibly, have concepts and values.
So evolution optimizes at (1), whereas in today’s ML we optimize at (2) and there is nothing really corresponding to (1) in most of today’s ML.
Did I understand you correctly?
That’s the key mechanistic difference between evolution and SGD. There’s an additional layer here that comes from how that mechanistic difference interacts with the circumstances of the ancestral environment (I.e., that ancestral humans never had an IGF abstraction), which means evolutionary optimization over the human mind blueprint in the ancestral environment would have never produced a blueprint that lead to value formation around IGF in the modern environment. This fully explains modern humanity’s misalignment wrt IGF, which would have happened even in worlds where inner alignment is never a problem for ML systems. Thus, evolutionary analogies tell us ~nothing about whether we should be worried about inner alignment.
(This is even ignoring the fact that IGF seems like a very hard concept to align minds to at all, due to the sparseness of IGF reward signals.)