All of that sounds reasonable to me. I still don’t see why you think editing weights is required, as opposed to something like editing external memory.
(Also, maybe we just won’t have AGI that learns by reading books, and instead it will be more useful to have a lot of task-specific AI systems with a huge amount of “built-in” knowledge, similarly to GPT-3. I wouldn’t put this as my most likely outcome, but it seems quite plausible.)
It seems totally plausible to give AI systems an external memory that they can read to / write from, and then you learn linear algebra without editing weights but with editing memory. Alternatively, you could have a recurrent neural net with a really big hidden state, and then that hidden state could be the equivalent of what you’re calling “synapses”.
I agree with Steve that it seems really weird to have these two parallel systems of knowledge encoding the same types of things. If an AGI learned the skill of speaking english during training, but then learned the skill of speaking french during deployment, then your hypotheses imply that the implementations of those two language skills will be totally different. And it then gets weirder if they overlap—e.g. if an AGI learns a fact during training which gets stored in its weights, and then reads a correction later on during deployment, do those original weights just stay there?
I do expect that we will continue to update AGI systems via editing weights in training loops, even after deployment. But this will be more like an iterative train-deploy-train-deploy cycle where each deploy step lasts e.g. days or more, rather than editing weights all the time (as with humans).
Based on this I guess your answer to my question above is “no”: the original fact will get overridden a few days later, and also the knowledge of french will be transferred into the weights eventually. But if those updates occur via self-supervised learning, then I’d count that as “autonomously edit[ing] its weights after training”. And with self-supervised learning, you don’t need to wait long for feedback, so why wouldn’t you use it to edit weights all the time? At the very least, that would free up space in the short-term memory/hidden state.
For my own part I’m happy to concede that AGIs will need some way of editing their weights during deployment. The big question for me is how continuous this is with the rest of the training process. E.g. do you just keep doing SGD, but with a smaller learning rate? Or will there be a different (meta-learned) weight update mechanism? My money’s on the latter. If it’s the former, then that would update me a bit towards Steve’s view, but I think I’d still expect evolution to be a good analogy for the earlier phases of SGD.
Maybe we just won’t have AGI that learns by reading books, and instead it will be more useful to have a lot of task-specific AI systems with a huge amount of “built-in” knowledge, similarly to GPT-3.
If this is the case, then that would shift me away from thinking of evolution as a good analogy for AGI, because the training process would then look more like the type of skill acquisition that happens during human lifetimes. In fact, this seems like the most likely way in which Steve is right that evolution is a bad analogy.
If an AGI learned the skill of speaking english during training, but then learned the skill of speaking french during deployment, then your hypotheses imply that the implementations of those two language skills will be totally different. And it then gets weirder if they overlap—e.g. if an AGI learns a fact during training which gets stored in its weights, and then reads a correction later on during deployment, do those original weights just stay there?
Idk, this just sounds plausible to me. I think the hope is that the weights encode more general reasoning abilities, and most of the “facts” or “background knowledge” gets moved into memory, but that won’t happen for everything and plausibly there will be this strange separation between the two. But like, sure, that doesn’t seem crazy.
I do expect we reconsolidate into weights through some outer algorithm like gradient descent (and that may not require any human input). If you want to count that as “autonomously editing its weights”, then fine, though I’m not sure how this influences any downstream disagreement.
Similar dynamics in humans:
Children are apparently better at learning languages than adults; it seems like adults are using some different process to learn languages (though probably not as different as editing memory vs. editing weights)
One theory of sleep is that it is consolidating the experiences of the day into synapses, suggesting that any within-day learning is not relying as much on editing synapses.
Tbc, I also think explicitly meta-learned update rules are plausible—don’t take any of this as “I think this is definitely going to happen” but more as “I don’t see a reason why this couldn’t happen”.
In fact, this seems like the most likely way in which Steve is right that evolution is a bad analogy.
Fwiw I’ve mostly been ignoring the point of whether or not evolution is a good analogy. If you want to discuss that, I want to know what specifically you use the analogy for. For example:
I think evolution is a good analogy for how inner alignment issues can arise.
I don’t think evolution is a good analogy for the process by which AGI is made (if you think that the analogy is that we literally use natural selection to improve AI systems).
It seems like Steve is arguing the second, and I probably agree (depending on what exactly he means, which I’m still not super clear on).
I think evolution is a good analogy for how inner alignment issues can arise.
I don’t think evolution is a good analogy for the process by which AGI is made (if you think that the analogy is that we literally use natural selection to improve AI systems).
Yes this post is about the process by which AGI is made, i.e. #2. (See “I want to be specific about what I’m arguing against here.”...) I’m not sure what you mean by “literal natural selection”, but FWIW I’m lumping together outer-loop optimization algorithms regardless of whether they’re evolutionary or gradient descent or downhill-simplex or whatever.
All of that sounds reasonable to me. I still don’t see why you think editing weights is required, as opposed to something like editing external memory.
(Also, maybe we just won’t have AGI that learns by reading books, and instead it will be more useful to have a lot of task-specific AI systems with a huge amount of “built-in” knowledge, similarly to GPT-3. I wouldn’t put this as my most likely outcome, but it seems quite plausible.)
I agree with Steve that it seems really weird to have these two parallel systems of knowledge encoding the same types of things. If an AGI learned the skill of speaking english during training, but then learned the skill of speaking french during deployment, then your hypotheses imply that the implementations of those two language skills will be totally different. And it then gets weirder if they overlap—e.g. if an AGI learns a fact during training which gets stored in its weights, and then reads a correction later on during deployment, do those original weights just stay there?
Based on this I guess your answer to my question above is “no”: the original fact will get overridden a few days later, and also the knowledge of french will be transferred into the weights eventually. But if those updates occur via self-supervised learning, then I’d count that as “autonomously edit[ing] its weights after training”. And with self-supervised learning, you don’t need to wait long for feedback, so why wouldn’t you use it to edit weights all the time? At the very least, that would free up space in the short-term memory/hidden state.
For my own part I’m happy to concede that AGIs will need some way of editing their weights during deployment. The big question for me is how continuous this is with the rest of the training process. E.g. do you just keep doing SGD, but with a smaller learning rate? Or will there be a different (meta-learned) weight update mechanism? My money’s on the latter. If it’s the former, then that would update me a bit towards Steve’s view, but I think I’d still expect evolution to be a good analogy for the earlier phases of SGD.
If this is the case, then that would shift me away from thinking of evolution as a good analogy for AGI, because the training process would then look more like the type of skill acquisition that happens during human lifetimes. In fact, this seems like the most likely way in which Steve is right that evolution is a bad analogy.
Idk, this just sounds plausible to me. I think the hope is that the weights encode more general reasoning abilities, and most of the “facts” or “background knowledge” gets moved into memory, but that won’t happen for everything and plausibly there will be this strange separation between the two. But like, sure, that doesn’t seem crazy.
I do expect we reconsolidate into weights through some outer algorithm like gradient descent (and that may not require any human input). If you want to count that as “autonomously editing its weights”, then fine, though I’m not sure how this influences any downstream disagreement.
Similar dynamics in humans:
Children are apparently better at learning languages than adults; it seems like adults are using some different process to learn languages (though probably not as different as editing memory vs. editing weights)
One theory of sleep is that it is consolidating the experiences of the day into synapses, suggesting that any within-day learning is not relying as much on editing synapses.
Tbc, I also think explicitly meta-learned update rules are plausible—don’t take any of this as “I think this is definitely going to happen” but more as “I don’t see a reason why this couldn’t happen”.
Fwiw I’ve mostly been ignoring the point of whether or not evolution is a good analogy. If you want to discuss that, I want to know what specifically you use the analogy for. For example:
I think evolution is a good analogy for how inner alignment issues can arise.
I don’t think evolution is a good analogy for the process by which AGI is made (if you think that the analogy is that we literally use natural selection to improve AI systems).
It seems like Steve is arguing the second, and I probably agree (depending on what exactly he means, which I’m still not super clear on).
Yes this post is about the process by which AGI is made, i.e. #2. (See “I want to be specific about what I’m arguing against here.”...) I’m not sure what you mean by “literal natural selection”, but FWIW I’m lumping together outer-loop optimization algorithms regardless of whether they’re evolutionary or gradient descent or downhill-simplex or whatever.