After reading this, I tried to imagine what an ML system would have to look like if there really were an equivalent of the kind of overhang that was present in evolution. I think that if we try to make the ML analogy such that SGD = evolution, then it would have to look something like: “There are some parameters which update really really slowly (DNA) compared to other parameters (neurons). The difference is like ~1,000,000,000x. Sometimes, all the fast parameters get wiped and the slow parameters update slightly. The process starts over and the fast parameters start from scratch because it seems like there is ~0 carryover between the information in the fast parameters of last generation and the fast parameters in the new generation.” In this analogy, the evolutionary-equivalent sharp left turn would be something like: “some of the information from the fast parameters is distilled down and utilized by the fast parameters of the new generation.” OP touches on this and this is not what we see in practice, so I agree with OP’s point here.
(I would be curious if anyone has info/link on how much certain parameters in a network change relative to other parameters. I have heard this discussed when talking about the resilience of terminal goals against SGD.)
A different analogy I thought of would be one where humans deciding on model architecture are the analogue for evolution and the training process itself is like within-lifetime learning. In these terms, if we wanted to imagine the equivalent of the sharp left turn, we could imagine that we had to keep making new models bc of finite “life-spans” and each time we started over, we used a similar architecture with some tweaks based on how the last generation of models performed (inter-generational shifts in gene frequency). The models gradually improve over time due to humans selecting on the architecture. In this analogy, the equivalent of the culture-based sharp left turn would be if humans started using the models of one generation to curate really good, distilled training data for the next generation. This would let each generation outperform the previous generations by noticeably more despite only gradual tweaks in architecture occurring between generations.
This is similar to what OP pointed out in talking about “AI iteratively refining its training data”. Although, in the case that the same AI is generating and using the training data, then it feels more analogous to note taking/refining your thoughts through journaling than it does to passing on knowledge between generations. I agree with OP’s concern about that leading to weird runaway effects.
I actually find this second version of the analogy where humans = evolution and SGD/training = within lifetime learning somewhat plausible. Of course, it is still missing some of the other pieces to have a sharp left turn (ie. the part where I assumed that models had short lifespans and the fact that irl models increase in size a lot each generation). Still, it does work as a bi-level optimization process where one of the levels has way more compute/happens way faster. In humans, we can’t really use our brains without reinforcement learning, so this analogy would also mean that deployment is like taking a snapshot of a human brain in a specific state and just initializing that every time.
I am not sure where this analogy breaks/ what the implications are for alignment, but I think it avoids some of the flaws of thinking in terms of evolution = SGD. By analogy, that would kind of mean that when we consciously act in ways that go against evolution, but that we think are good, we’re exhibiting Outer Misalignment.
By analogy, that would also mean that when we voluntarily make choices that we would not consciously endorse, we are exhibiting some level of inner misalignment. I am not sure how I feel about this one; that might be a stretch. It would make a separation between some kind of “inner learning process” in our brains that is kind of the equivalent of SGD and the rest of our brains that are the equivalent of the NN. We can act in accordance with the inner learner and that connection of neurons will be strengthened or we act against it and learn not to do that. Humans don’t really have a “deployment” phase. (Although, if I wanted to be somewhat unkind I might say that some people do more or less stop actually changing their inner NNs at some point in life and only act based on their context windows.)
After reading this, I tried to imagine what an ML system would have to look like if there really were an equivalent of the kind of overhang that was present in evolution. I think that if we try to make the ML analogy such that SGD = evolution, then it would have to look something like: “There are some parameters which update really really slowly (DNA) compared to other parameters (neurons). The difference is like ~1,000,000,000x. Sometimes, all the fast parameters get wiped and the slow parameters update slightly. The process starts over and the fast parameters start from scratch because it seems like there is ~0 carryover between the information in the fast parameters of last generation and the fast parameters in the new generation.” In this analogy, the evolutionary-equivalent sharp left turn would be something like: “some of the information from the fast parameters is distilled down and utilized by the fast parameters of the new generation.” OP touches on this and this is not what we see in practice, so I agree with OP’s point here.
(I would be curious if anyone has info/link on how much certain parameters in a network change relative to other parameters. I have heard this discussed when talking about the resilience of terminal goals against SGD.)
A different analogy I thought of would be one where humans deciding on model architecture are the analogue for evolution and the training process itself is like within-lifetime learning. In these terms, if we wanted to imagine the equivalent of the sharp left turn, we could imagine that we had to keep making new models bc of finite “life-spans” and each time we started over, we used a similar architecture with some tweaks based on how the last generation of models performed (inter-generational shifts in gene frequency). The models gradually improve over time due to humans selecting on the architecture. In this analogy, the equivalent of the culture-based sharp left turn would be if humans started using the models of one generation to curate really good, distilled training data for the next generation. This would let each generation outperform the previous generations by noticeably more despite only gradual tweaks in architecture occurring between generations.
This is similar to what OP pointed out in talking about “AI iteratively refining its training data”. Although, in the case that the same AI is generating and using the training data, then it feels more analogous to note taking/refining your thoughts through journaling than it does to passing on knowledge between generations. I agree with OP’s concern about that leading to weird runaway effects.
I actually find this second version of the analogy where humans = evolution and SGD/training = within lifetime learning somewhat plausible. Of course, it is still missing some of the other pieces to have a sharp left turn (ie. the part where I assumed that models had short lifespans and the fact that irl models increase in size a lot each generation). Still, it does work as a bi-level optimization process where one of the levels has way more compute/happens way faster. In humans, we can’t really use our brains without reinforcement learning, so this analogy would also mean that deployment is like taking a snapshot of a human brain in a specific state and just initializing that every time.
I am not sure where this analogy breaks/ what the implications are for alignment, but I think it avoids some of the flaws of thinking in terms of evolution = SGD. By analogy, that would kind of mean that when we consciously act in ways that go against evolution, but that we think are good, we’re exhibiting Outer Misalignment.
By analogy, that would also mean that when we voluntarily make choices that we would not consciously endorse, we are exhibiting some level of inner misalignment. I am not sure how I feel about this one; that might be a stretch. It would make a separation between some kind of “inner learning process” in our brains that is kind of the equivalent of SGD and the rest of our brains that are the equivalent of the NN. We can act in accordance with the inner learner and that connection of neurons will be strengthened or we act against it and learn not to do that. Humans don’t really have a “deployment” phase. (Although, if I wanted to be somewhat unkind I might say that some people do more or less stop actually changing their inner NNs at some point in life and only act based on their context windows.)
I don’t know, let me know what you think.