If an AGI learned the skill of speaking english during training, but then learned the skill of speaking french during deployment, then your hypotheses imply that the implementations of those two language skills will be totally different. And it then gets weirder if they overlap—e.g. if an AGI learns a fact during training which gets stored in its weights, and then reads a correction later on during deployment, do those original weights just stay there?
Idk, this just sounds plausible to me. I think the hope is that the weights encode more general reasoning abilities, and most of the “facts” or “background knowledge” gets moved into memory, but that won’t happen for everything and plausibly there will be this strange separation between the two. But like, sure, that doesn’t seem crazy.
I do expect we reconsolidate into weights through some outer algorithm like gradient descent (and that may not require any human input). If you want to count that as “autonomously editing its weights”, then fine, though I’m not sure how this influences any downstream disagreement.
Similar dynamics in humans:
Children are apparently better at learning languages than adults; it seems like adults are using some different process to learn languages (though probably not as different as editing memory vs. editing weights)
One theory of sleep is that it is consolidating the experiences of the day into synapses, suggesting that any within-day learning is not relying as much on editing synapses.
Tbc, I also think explicitly meta-learned update rules are plausible—don’t take any of this as “I think this is definitely going to happen” but more as “I don’t see a reason why this couldn’t happen”.
In fact, this seems like the most likely way in which Steve is right that evolution is a bad analogy.
Fwiw I’ve mostly been ignoring the point of whether or not evolution is a good analogy. If you want to discuss that, I want to know what specifically you use the analogy for. For example:
I think evolution is a good analogy for how inner alignment issues can arise.
I don’t think evolution is a good analogy for the process by which AGI is made (if you think that the analogy is that we literally use natural selection to improve AI systems).
It seems like Steve is arguing the second, and I probably agree (depending on what exactly he means, which I’m still not super clear on).
I think evolution is a good analogy for how inner alignment issues can arise.
I don’t think evolution is a good analogy for the process by which AGI is made (if you think that the analogy is that we literally use natural selection to improve AI systems).
Yes this post is about the process by which AGI is made, i.e. #2. (See “I want to be specific about what I’m arguing against here.”...) I’m not sure what you mean by “literal natural selection”, but FWIW I’m lumping together outer-loop optimization algorithms regardless of whether they’re evolutionary or gradient descent or downhill-simplex or whatever.
Idk, this just sounds plausible to me. I think the hope is that the weights encode more general reasoning abilities, and most of the “facts” or “background knowledge” gets moved into memory, but that won’t happen for everything and plausibly there will be this strange separation between the two. But like, sure, that doesn’t seem crazy.
I do expect we reconsolidate into weights through some outer algorithm like gradient descent (and that may not require any human input). If you want to count that as “autonomously editing its weights”, then fine, though I’m not sure how this influences any downstream disagreement.
Similar dynamics in humans:
Children are apparently better at learning languages than adults; it seems like adults are using some different process to learn languages (though probably not as different as editing memory vs. editing weights)
One theory of sleep is that it is consolidating the experiences of the day into synapses, suggesting that any within-day learning is not relying as much on editing synapses.
Tbc, I also think explicitly meta-learned update rules are plausible—don’t take any of this as “I think this is definitely going to happen” but more as “I don’t see a reason why this couldn’t happen”.
Fwiw I’ve mostly been ignoring the point of whether or not evolution is a good analogy. If you want to discuss that, I want to know what specifically you use the analogy for. For example:
I think evolution is a good analogy for how inner alignment issues can arise.
I don’t think evolution is a good analogy for the process by which AGI is made (if you think that the analogy is that we literally use natural selection to improve AI systems).
It seems like Steve is arguing the second, and I probably agree (depending on what exactly he means, which I’m still not super clear on).
Yes this post is about the process by which AGI is made, i.e. #2. (See “I want to be specific about what I’m arguing against here.”...) I’m not sure what you mean by “literal natural selection”, but FWIW I’m lumping together outer-loop optimization algorithms regardless of whether they’re evolutionary or gradient descent or downhill-simplex or whatever.