Hmm, imagine you get a job doing bicycle repair. After a while, you’ve learned a vocabulary of probably thousands of entities and affordances and interrelationships (the chain, one link on the chain, the way the chain moves, the feel of clicking the chain into place on the gear, what it looks like if a chain is loose, what it feels like to the rider when a chain is loose, if I touch the chain then my finger will be greasy, etc. etc.). All that information is stored in a highly-structured way in your brain (I think some souped-up version of a PGM, but let’s not get into that), such that it can grow to hold a massive amount of information while remaining easily searchable and usable. The problem with working memory is not capacity per se, it’s that it’s not stored in this structured, easily-usable-and-searchable way. So the more information you put there, the more you start getting bogged down and missing things. Ditto with pen and paper, or a recurrent state, etc.
I find it helpful to think about our brain’s understanding as lots of subroutines running in parallel. (Kaj calls these things “subagents”, I more typically call them “generative models”, Kurzweil calls them “patterns”, Minsky calls this idea “society of mind”, etc.) They all mostly just sit around doing nothing. But sometimes they recognize a scenario for which they have something to say, and then they jump in and say it. So in chess, there’s a subroutine that says “If the board position has such-and-characteristics, it’s worthwhile to consider moving the pawn.” The subroutine sits quietly for months until the board has that position, and then it jumps in and injects its idea. And of course, once you consider moving the pawn, that brings to mind a different board position, and then new subroutines will recognize them, jump in, and have their say, etc.
Or if you take an imperfect rule, like “Python code runs the same on Windows and Mac”, the reason we can get by using this rule is because we have a whole ecosystem of subroutines on the lookout for exceptions to the rule. There’s the main subroutine that says “Yes, Python code runs the same on Windows and Mac.” But there’s another subroutine that says “If you’re sharing code between Windows and Mac, and there’s a file path variable, then it’s important to follow such-and-such best practices”. And yet another subroutine is sitting around looking for the presence of system library calls in cross-platform code, etc. etc.
That’s what it looks like to have knowledge that is properly structured and searchable and usable. I think that’s part of what the trained transformer layers are doing in GPT-3—checking whether any subroutines need to jump in and start doing their thing (or need to stop, or need to proceed to their next step (when they’re time-sequenced)), based on the context of other subroutines that are currently active.
I think that GPT-3 as used today is more-or-less restricted to the subroutines that were used by people in the course of typing text within the GPT-3 training corpus. But if you, Rohin, think about your own personal knowledge of AI alignment, RL, etc. that you’ve built up over the years, you’ve created countless thousands of new little subroutines, interconnected with each other, which only exist in your brain. When you hear someone talking about utility functions, you have a subroutine that says “Every possible policy is consistent with some utility function!”, and it’s waiting to jump in if the person says something that contradicts that. And of course that subroutine is supported by hundreds of other little interconnected subroutines with various caveats and counterarguments and so on.
Anyway, what’s the bar for an AI to be an AGI? I dunno, but one question is: “Is it competent enough to help with AI alignment research?” My strong hunch is that the AI wouldn’t be all that helpful unless it’s able to add new things to its own structured knowledge base, like new subroutines that say “We already tried that idea and it doesn’t work”, or “This idea almost works but is missing such-and-such ingredient”, or “Such-and-such combination of ingredients would have this interesting property”.
Hmm, well, actually, I guess it’s very possible that GPT-3 is already a somewhat-helpful tool for generating / brainstorming ideas in AI alignment research. Maybe I would use it myself if I had access! I should have said “Is it competent enough to do AI alignment research”. :-D
I agree that your “crux” is a crux, although I would say “effective” instead of “efficient”. I think the inability to add new things to its own structured knowledge base is a limitation on what the AI can do, not just what it can do given a certain compute budget.
This feels analogous to “the AGI doesn’t go and run on its own, it operates by changing values in RAM according to the assembly language interpreter hardwired into the CPU chip”. Like, it’s true, but it seems like it’s operating at the wrong level of abstraction.
Hmm, the point of this post is to argue that we won’t make AGI via a specific development path involving the following three ingredients, blah blah blah. Then there’s a second step: “If so, then what? What does that imply about the resulting AGI?” I didn’t talk about that; it’s a different issue. In particular I am not making the argument that “the algorithm’s cognition will basically be human-legible”, and I don’t believe that.
All of that sounds reasonable to me. I still don’t see why you think editing weights is required, as opposed to something like editing external memory.
(Also, maybe we just won’t have AGI that learns by reading books, and instead it will be more useful to have a lot of task-specific AI systems with a huge amount of “built-in” knowledge, similarly to GPT-3. I wouldn’t put this as my most likely outcome, but it seems quite plausible.)
It seems totally plausible to give AI systems an external memory that they can read to / write from, and then you learn linear algebra without editing weights but with editing memory. Alternatively, you could have a recurrent neural net with a really big hidden state, and then that hidden state could be the equivalent of what you’re calling “synapses”.
I agree with Steve that it seems really weird to have these two parallel systems of knowledge encoding the same types of things. If an AGI learned the skill of speaking english during training, but then learned the skill of speaking french during deployment, then your hypotheses imply that the implementations of those two language skills will be totally different. And it then gets weirder if they overlap—e.g. if an AGI learns a fact during training which gets stored in its weights, and then reads a correction later on during deployment, do those original weights just stay there?
I do expect that we will continue to update AGI systems via editing weights in training loops, even after deployment. But this will be more like an iterative train-deploy-train-deploy cycle where each deploy step lasts e.g. days or more, rather than editing weights all the time (as with humans).
Based on this I guess your answer to my question above is “no”: the original fact will get overridden a few days later, and also the knowledge of french will be transferred into the weights eventually. But if those updates occur via self-supervised learning, then I’d count that as “autonomously edit[ing] its weights after training”. And with self-supervised learning, you don’t need to wait long for feedback, so why wouldn’t you use it to edit weights all the time? At the very least, that would free up space in the short-term memory/hidden state.
For my own part I’m happy to concede that AGIs will need some way of editing their weights during deployment. The big question for me is how continuous this is with the rest of the training process. E.g. do you just keep doing SGD, but with a smaller learning rate? Or will there be a different (meta-learned) weight update mechanism? My money’s on the latter. If it’s the former, then that would update me a bit towards Steve’s view, but I think I’d still expect evolution to be a good analogy for the earlier phases of SGD.
Maybe we just won’t have AGI that learns by reading books, and instead it will be more useful to have a lot of task-specific AI systems with a huge amount of “built-in” knowledge, similarly to GPT-3.
If this is the case, then that would shift me away from thinking of evolution as a good analogy for AGI, because the training process would then look more like the type of skill acquisition that happens during human lifetimes. In fact, this seems like the most likely way in which Steve is right that evolution is a bad analogy.
If an AGI learned the skill of speaking english during training, but then learned the skill of speaking french during deployment, then your hypotheses imply that the implementations of those two language skills will be totally different. And it then gets weirder if they overlap—e.g. if an AGI learns a fact during training which gets stored in its weights, and then reads a correction later on during deployment, do those original weights just stay there?
Idk, this just sounds plausible to me. I think the hope is that the weights encode more general reasoning abilities, and most of the “facts” or “background knowledge” gets moved into memory, but that won’t happen for everything and plausibly there will be this strange separation between the two. But like, sure, that doesn’t seem crazy.
I do expect we reconsolidate into weights through some outer algorithm like gradient descent (and that may not require any human input). If you want to count that as “autonomously editing its weights”, then fine, though I’m not sure how this influences any downstream disagreement.
Similar dynamics in humans:
Children are apparently better at learning languages than adults; it seems like adults are using some different process to learn languages (though probably not as different as editing memory vs. editing weights)
One theory of sleep is that it is consolidating the experiences of the day into synapses, suggesting that any within-day learning is not relying as much on editing synapses.
Tbc, I also think explicitly meta-learned update rules are plausible—don’t take any of this as “I think this is definitely going to happen” but more as “I don’t see a reason why this couldn’t happen”.
In fact, this seems like the most likely way in which Steve is right that evolution is a bad analogy.
Fwiw I’ve mostly been ignoring the point of whether or not evolution is a good analogy. If you want to discuss that, I want to know what specifically you use the analogy for. For example:
I think evolution is a good analogy for how inner alignment issues can arise.
I don’t think evolution is a good analogy for the process by which AGI is made (if you think that the analogy is that we literally use natural selection to improve AI systems).
It seems like Steve is arguing the second, and I probably agree (depending on what exactly he means, which I’m still not super clear on).
I think evolution is a good analogy for how inner alignment issues can arise.
I don’t think evolution is a good analogy for the process by which AGI is made (if you think that the analogy is that we literally use natural selection to improve AI systems).
Yes this post is about the process by which AGI is made, i.e. #2. (See “I want to be specific about what I’m arguing against here.”...) I’m not sure what you mean by “literal natural selection”, but FWIW I’m lumping together outer-loop optimization algorithms regardless of whether they’re evolutionary or gradient descent or downhill-simplex or whatever.
Thanks again, this is really helpful.
Hmm, imagine you get a job doing bicycle repair. After a while, you’ve learned a vocabulary of probably thousands of entities and affordances and interrelationships (the chain, one link on the chain, the way the chain moves, the feel of clicking the chain into place on the gear, what it looks like if a chain is loose, what it feels like to the rider when a chain is loose, if I touch the chain then my finger will be greasy, etc. etc.). All that information is stored in a highly-structured way in your brain (I think some souped-up version of a PGM, but let’s not get into that), such that it can grow to hold a massive amount of information while remaining easily searchable and usable. The problem with working memory is not capacity per se, it’s that it’s not stored in this structured, easily-usable-and-searchable way. So the more information you put there, the more you start getting bogged down and missing things. Ditto with pen and paper, or a recurrent state, etc.
I find it helpful to think about our brain’s understanding as lots of subroutines running in parallel. (Kaj calls these things “subagents”, I more typically call them “generative models”, Kurzweil calls them “patterns”, Minsky calls this idea “society of mind”, etc.) They all mostly just sit around doing nothing. But sometimes they recognize a scenario for which they have something to say, and then they jump in and say it. So in chess, there’s a subroutine that says “If the board position has such-and-characteristics, it’s worthwhile to consider moving the pawn.” The subroutine sits quietly for months until the board has that position, and then it jumps in and injects its idea. And of course, once you consider moving the pawn, that brings to mind a different board position, and then new subroutines will recognize them, jump in, and have their say, etc.
Or if you take an imperfect rule, like “Python code runs the same on Windows and Mac”, the reason we can get by using this rule is because we have a whole ecosystem of subroutines on the lookout for exceptions to the rule. There’s the main subroutine that says “Yes, Python code runs the same on Windows and Mac.” But there’s another subroutine that says “If you’re sharing code between Windows and Mac, and there’s a file path variable, then it’s important to follow such-and-such best practices”. And yet another subroutine is sitting around looking for the presence of system library calls in cross-platform code, etc. etc.
That’s what it looks like to have knowledge that is properly structured and searchable and usable. I think that’s part of what the trained transformer layers are doing in GPT-3—checking whether any subroutines need to jump in and start doing their thing (or need to stop, or need to proceed to their next step (when they’re time-sequenced)), based on the context of other subroutines that are currently active.
I think that GPT-3 as used today is more-or-less restricted to the subroutines that were used by people in the course of typing text within the GPT-3 training corpus. But if you, Rohin, think about your own personal knowledge of AI alignment, RL, etc. that you’ve built up over the years, you’ve created countless thousands of new little subroutines, interconnected with each other, which only exist in your brain. When you hear someone talking about utility functions, you have a subroutine that says “Every possible policy is consistent with some utility function!”, and it’s waiting to jump in if the person says something that contradicts that. And of course that subroutine is supported by hundreds of other little interconnected subroutines with various caveats and counterarguments and so on.
Anyway, what’s the bar for an AI to be an AGI? I dunno, but one question is: “Is it competent enough to help with AI alignment research?” My strong hunch is that the AI wouldn’t be all that helpful unless it’s able to add new things to its own structured knowledge base, like new subroutines that say “We already tried that idea and it doesn’t work”, or “This idea almost works but is missing such-and-such ingredient”, or “Such-and-such combination of ingredients would have this interesting property”.
Hmm, well, actually, I guess it’s very possible that GPT-3 is already a somewhat-helpful tool for generating / brainstorming ideas in AI alignment research. Maybe I would use it myself if I had access! I should have said “Is it competent enough to do AI alignment research”. :-D
I agree that your “crux” is a crux, although I would say “effective” instead of “efficient”. I think the inability to add new things to its own structured knowledge base is a limitation on what the AI can do, not just what it can do given a certain compute budget.
Hmm, the point of this post is to argue that we won’t make AGI via a specific development path involving the following three ingredients, blah blah blah. Then there’s a second step: “If so, then what? What does that imply about the resulting AGI?” I didn’t talk about that; it’s a different issue. In particular I am not making the argument that “the algorithm’s cognition will basically be human-legible”, and I don’t believe that.
All of that sounds reasonable to me. I still don’t see why you think editing weights is required, as opposed to something like editing external memory.
(Also, maybe we just won’t have AGI that learns by reading books, and instead it will be more useful to have a lot of task-specific AI systems with a huge amount of “built-in” knowledge, similarly to GPT-3. I wouldn’t put this as my most likely outcome, but it seems quite plausible.)
I agree with Steve that it seems really weird to have these two parallel systems of knowledge encoding the same types of things. If an AGI learned the skill of speaking english during training, but then learned the skill of speaking french during deployment, then your hypotheses imply that the implementations of those two language skills will be totally different. And it then gets weirder if they overlap—e.g. if an AGI learns a fact during training which gets stored in its weights, and then reads a correction later on during deployment, do those original weights just stay there?
Based on this I guess your answer to my question above is “no”: the original fact will get overridden a few days later, and also the knowledge of french will be transferred into the weights eventually. But if those updates occur via self-supervised learning, then I’d count that as “autonomously edit[ing] its weights after training”. And with self-supervised learning, you don’t need to wait long for feedback, so why wouldn’t you use it to edit weights all the time? At the very least, that would free up space in the short-term memory/hidden state.
For my own part I’m happy to concede that AGIs will need some way of editing their weights during deployment. The big question for me is how continuous this is with the rest of the training process. E.g. do you just keep doing SGD, but with a smaller learning rate? Or will there be a different (meta-learned) weight update mechanism? My money’s on the latter. If it’s the former, then that would update me a bit towards Steve’s view, but I think I’d still expect evolution to be a good analogy for the earlier phases of SGD.
If this is the case, then that would shift me away from thinking of evolution as a good analogy for AGI, because the training process would then look more like the type of skill acquisition that happens during human lifetimes. In fact, this seems like the most likely way in which Steve is right that evolution is a bad analogy.
Idk, this just sounds plausible to me. I think the hope is that the weights encode more general reasoning abilities, and most of the “facts” or “background knowledge” gets moved into memory, but that won’t happen for everything and plausibly there will be this strange separation between the two. But like, sure, that doesn’t seem crazy.
I do expect we reconsolidate into weights through some outer algorithm like gradient descent (and that may not require any human input). If you want to count that as “autonomously editing its weights”, then fine, though I’m not sure how this influences any downstream disagreement.
Similar dynamics in humans:
Children are apparently better at learning languages than adults; it seems like adults are using some different process to learn languages (though probably not as different as editing memory vs. editing weights)
One theory of sleep is that it is consolidating the experiences of the day into synapses, suggesting that any within-day learning is not relying as much on editing synapses.
Tbc, I also think explicitly meta-learned update rules are plausible—don’t take any of this as “I think this is definitely going to happen” but more as “I don’t see a reason why this couldn’t happen”.
Fwiw I’ve mostly been ignoring the point of whether or not evolution is a good analogy. If you want to discuss that, I want to know what specifically you use the analogy for. For example:
I think evolution is a good analogy for how inner alignment issues can arise.
I don’t think evolution is a good analogy for the process by which AGI is made (if you think that the analogy is that we literally use natural selection to improve AI systems).
It seems like Steve is arguing the second, and I probably agree (depending on what exactly he means, which I’m still not super clear on).
Yes this post is about the process by which AGI is made, i.e. #2. (See “I want to be specific about what I’m arguing against here.”...) I’m not sure what you mean by “literal natural selection”, but FWIW I’m lumping together outer-loop optimization algorithms regardless of whether they’re evolutionary or gradient descent or downhill-simplex or whatever.