What you’re saying is that evolution optimized over changes to a kind of blueprint-for-a-human (DNA) that does not directly “do” anything like cognition with concepts and values, but which grows, through cell division and later through cognitive learning, into a human that does do things like cognition with concepts and values. This grown human then goes on to exhibit behavior and have an impact on the world. So there is an approximate two-stage thing happening:
(1) blueprint → (2) agent → (3) behavior
In contrast, when we optimize over policies in ML, we optimize directly at the level of a kind of cognition-machine (e.g. some neural net architecture) that directly acts in the world, and could, quite plausibly, have concepts and values.
So evolution optimizes at (1), whereas in today’s ML we optimize at (2) and there is nothing really corresponding to (1) in most of today’s ML.
That’s the key mechanistic difference between evolution and SGD. There’s an additional layer here that comes from how that mechanistic difference interacts with the circumstances of the ancestral environment (I.e., that ancestral humans never had an IGF abstraction), which means evolutionary optimization over the human mind blueprint in the ancestral environment would have never produced a blueprint that lead to value formation around IGF in the modern environment. This fully explains modern humanity’s misalignment wrt IGF, which would have happened even in worlds where inner alignment is never a problem for ML systems. Thus, evolutionary analogies tell us ~nothing about whether we should be worried about inner alignment.
(This is even ignoring the fact that IGF seems like a very hard concept to align minds to at all, due to the sparseness of IGF reward signals.)
What you’re saying is that evolution optimized over changes to a kind of blueprint-for-a-human (DNA) that does not directly “do” anything like cognition with concepts and values, but which grows, through cell division and later through cognitive learning, into a human that does do things like cognition with concepts and values. This grown human then goes on to exhibit behavior and have an impact on the world. So there is an approximate two-stage thing happening:
(1) blueprint → (2) agent → (3) behavior
In contrast, when we optimize over policies in ML, we optimize directly at the level of a kind of cognition-machine (e.g. some neural net architecture) that directly acts in the world, and could, quite plausibly, have concepts and values.
So evolution optimizes at (1), whereas in today’s ML we optimize at (2) and there is nothing really corresponding to (1) in most of today’s ML.
Did I understand you correctly?
That’s the key mechanistic difference between evolution and SGD. There’s an additional layer here that comes from how that mechanistic difference interacts with the circumstances of the ancestral environment (I.e., that ancestral humans never had an IGF abstraction), which means evolutionary optimization over the human mind blueprint in the ancestral environment would have never produced a blueprint that lead to value formation around IGF in the modern environment. This fully explains modern humanity’s misalignment wrt IGF, which would have happened even in worlds where inner alignment is never a problem for ML systems. Thus, evolutionary analogies tell us ~nothing about whether we should be worried about inner alignment.
(This is even ignoring the fact that IGF seems like a very hard concept to align minds to at all, due to the sparseness of IGF reward signals.)