A lot of your comments are trying to relate this to GPT-3, I think. Maybe things will be clearer if I just directly describe how I think about GPT-3.
The evolution analogy (as I’m defining it) says that “The AGI” is identified as the inner algorithm, not the inner and outer algorithm working together. In other words, if I ask the AGI a question, I don’t need the outer algorithm to be running in the course of answering that question. Of course the GPT-3 trained model is already capable of answering “easy” questions, but I’m thinking here about “very hard” questions that need the serious construction of lots of new knowledge and ideas that build on each other. I don’t think the GPT-3 trained model can do that by itself.
Now for GPT-3, the outer algorithm edits weights, and the inner algorithm edits activations. I am very impressed about the capabilities of the GPT-3 weights, edited by SGD, to store an open-ended world model of greater and greater complexity as you train it more and more. I am not so optimistic that the GPT-3 activations can do that, without somehow transferring information from activations to weights. And not just for the stupid reason that it has a finite training window. (For example, other transformer models have recurrency.)
Why don’t I think that the GPT-3 trained model is just as capable of building out an open-ended world-model of ever greater complexity using activations not weights?
For one thing, it strikes me as a bit weird to think that there will be this centaur-like world model constructed out of X% weights and (100-X)% activations. And what if GPT comes to realize that one of its previous beliefs is actually wrong? Can the activations somehow act as if they’re overwriting the weights? Just seems weird. How much information content can you put in the activations anyway? I don’t know off the top of my head, but much less than the amount you can put in the weights.
When I think of the AGI-hard part of “learning”, I think of building a solid bedrock of knowledge and ideas, such that you can build new ideas on top of the old ideas, in an arbitrarily high tower. That’s the part that I don’t think GPT-3 inner algorithm (trained model) by itself can do. (The outer algorithm obviously does it.) Again, I think you would need to somehow transfer information from the activations to the weights, maybe by doing something vaguely like amplification, if you were to make a real-deal AGI from something like GPT-3.
My human brain analogy for GPT-3: One thing we humans do is build a giant interconnected predictive world-model by editing synapses over the course of our lifetimes. Another thing we do is flexibly combine the knowledge and ideas we already have, on the fly, to make sense of a new input, including using working memory and so on. Don’t get me wrong, this is a really hard and impressive calculation, and it can do lots of things—I think it amounts to searching over this vast combinatorial space of compositional probabilistic generative models (see analysis-by-synthesis discussion here, or also here). But it does not involve editing synapses. It’s different. You’ve never seen nor imagined a “banana hat” in your life, but if you saw one, you would immediately understand what it is, how to manipulate it, roughly how much it weighs, etc., simply by snapping together a bunch of your existing banana-related generative models with a bunch of your existing hat-related generative models into some composite which is self-consistent and maximally consistent with your visual inputs and experience. You can do all that and much more without editing synapses.
Anyway, my human brain analogy for GPT-3 is: I think the GPT-3 outer algorithm is more-or-less akin to editing synapses, and the GPT-3 inner algorithm is more-or-less akin to the brain’s inference-time calculation (...but if humans had a more impressive working memory than we actually do).
The inference-time calculation is impressive but only goes so far. You can’t learn linear algebra without editing synapses. There’s just too many new concepts built on top of each other, and too many new connections to be learned.
If you were to turn GPT-3 into an AGI, the closest version consistent with my current expectations would be that someone took the GPT-3 trained model but somehow inserted some kind of online-learning mechanism to update the weights as it goes (again, maybe amplification or whatever). I’m willing to believe that something like that could happen, and it would not qualify as “evolution analogy” by my definition.
Even for humans it doesn’t seem right to say that the brain’s equivalent of backprop is the algorithm that “reads books and watches movies” etc, it seems like backprop created a black-box-ish capability of “learning from language” that we can then invoke to learn faster.
Learning algorithms always involve an interaction between the algorithm itself and what-has-been-learned-so-far, right? Even gradient descent takes a different step depending on the current state of the model-in-training. Again see the “Inner As AGI” criterion near the top for why this is different from the thing I’m arguing against. The “learning from language” black box here doesn’t go off and run on its own; it learns new things using by editing synapses according to the synapse-editing algorithm hardwired into the genome.
Thanks, this was helpful in understanding in where you’re coming from.
When I think of the AGI-hard part of “learning”, I think of building a solid bedrock of knowledge and ideas, such that you can build new ideas on top of the old ideas, in an arbitrarily high tower.
I don’t feel like humans meet this bar. Maybe mathematicians, and even then, I probably still wouldn’t agree. Especially not humans without external memory (e.g. paper). But presumably such humans still count as generally intelligent.
Anyway, my human brain analogy for GPT-3 is: I think the GPT-3 outer algorithm is more-or-less akin to editing synapses, and the GPT-3 inner algorithm is more-or-less akin to the brain’s inference-time calculation (...but if humans had a more impressive working memory than we actually do).
Seems reasonable.
The inference-time calculation is impressive but only goes so far. You can’t learn linear algebra without editing synapses. There’s just too many new concepts built on top of each other, and too many new connections to be learned.
I think this makes sense in the context of humans but not in the context of AI (if you say weights = synapses). It seems totally plausible to give AI systems an external memory that they can read to / write from, and then you learn linear algebra without editing weights but with editing memory. Alternatively, you could have a recurrent neural net with a really big hidden state, and then that hidden state could be the equivalent of what you’re calling “synapses”.
The “learning from language” black box here doesn’t go off and run on its own; it learns new things using by editing synapses according to the synapse-editing algorithm hardwired into the genome.
This feels analogous to “the AGI doesn’t go and run on its own, it operates by changing values in RAM according to the assembly language interpreter hardwired into the CPU chip”. Like, it’s true, but it seems like it’s operating at the wrong level of abstraction.
Once you’ve reached the point of creating schools and courses, and using spaced repetition and practice exercises, you probably don’t want to be thinking in terms of “this is all stuff that’s been done by the synapse-editing algorithm hardwired into the genome”, you’ve shifted to a qualitatively new kind of learning.
----
It seems like a central crux here is:
Is it possible to build a reasonably efficient AGI that doesn’t autonomously edit its weights after training?
(By AGI here I mean something about as capable as humans on a variety of tasks.)
Caveats on my “yes” position:
I wouldn’t be that surprised if in practice it turns out that continually editing the weights even at deployment time is the most efficient thing to do, but I would be surprised if the difference is many orders of magnitude.
I do expect that we will continue to update AGI systems via editing weights in training loops, even after deployment. But this will be more like an iterative train-deploy-train-deploy cycle where each deploy step lasts e.g. days or more, rather than editing weights all the time (as with humans).
Hmm, imagine you get a job doing bicycle repair. After a while, you’ve learned a vocabulary of probably thousands of entities and affordances and interrelationships (the chain, one link on the chain, the way the chain moves, the feel of clicking the chain into place on the gear, what it looks like if a chain is loose, what it feels like to the rider when a chain is loose, if I touch the chain then my finger will be greasy, etc. etc.). All that information is stored in a highly-structured way in your brain (I think some souped-up version of a PGM, but let’s not get into that), such that it can grow to hold a massive amount of information while remaining easily searchable and usable. The problem with working memory is not capacity per se, it’s that it’s not stored in this structured, easily-usable-and-searchable way. So the more information you put there, the more you start getting bogged down and missing things. Ditto with pen and paper, or a recurrent state, etc.
I find it helpful to think about our brain’s understanding as lots of subroutines running in parallel. (Kaj calls these things “subagents”, I more typically call them “generative models”, Kurzweil calls them “patterns”, Minsky calls this idea “society of mind”, etc.) They all mostly just sit around doing nothing. But sometimes they recognize a scenario for which they have something to say, and then they jump in and say it. So in chess, there’s a subroutine that says “If the board position has such-and-characteristics, it’s worthwhile to consider moving the pawn.” The subroutine sits quietly for months until the board has that position, and then it jumps in and injects its idea. And of course, once you consider moving the pawn, that brings to mind a different board position, and then new subroutines will recognize them, jump in, and have their say, etc.
Or if you take an imperfect rule, like “Python code runs the same on Windows and Mac”, the reason we can get by using this rule is because we have a whole ecosystem of subroutines on the lookout for exceptions to the rule. There’s the main subroutine that says “Yes, Python code runs the same on Windows and Mac.” But there’s another subroutine that says “If you’re sharing code between Windows and Mac, and there’s a file path variable, then it’s important to follow such-and-such best practices”. And yet another subroutine is sitting around looking for the presence of system library calls in cross-platform code, etc. etc.
That’s what it looks like to have knowledge that is properly structured and searchable and usable. I think that’s part of what the trained transformer layers are doing in GPT-3—checking whether any subroutines need to jump in and start doing their thing (or need to stop, or need to proceed to their next step (when they’re time-sequenced)), based on the context of other subroutines that are currently active.
I think that GPT-3 as used today is more-or-less restricted to the subroutines that were used by people in the course of typing text within the GPT-3 training corpus. But if you, Rohin, think about your own personal knowledge of AI alignment, RL, etc. that you’ve built up over the years, you’ve created countless thousands of new little subroutines, interconnected with each other, which only exist in your brain. When you hear someone talking about utility functions, you have a subroutine that says “Every possible policy is consistent with some utility function!”, and it’s waiting to jump in if the person says something that contradicts that. And of course that subroutine is supported by hundreds of other little interconnected subroutines with various caveats and counterarguments and so on.
Anyway, what’s the bar for an AI to be an AGI? I dunno, but one question is: “Is it competent enough to help with AI alignment research?” My strong hunch is that the AI wouldn’t be all that helpful unless it’s able to add new things to its own structured knowledge base, like new subroutines that say “We already tried that idea and it doesn’t work”, or “This idea almost works but is missing such-and-such ingredient”, or “Such-and-such combination of ingredients would have this interesting property”.
Hmm, well, actually, I guess it’s very possible that GPT-3 is already a somewhat-helpful tool for generating / brainstorming ideas in AI alignment research. Maybe I would use it myself if I had access! I should have said “Is it competent enough to do AI alignment research”. :-D
I agree that your “crux” is a crux, although I would say “effective” instead of “efficient”. I think the inability to add new things to its own structured knowledge base is a limitation on what the AI can do, not just what it can do given a certain compute budget.
This feels analogous to “the AGI doesn’t go and run on its own, it operates by changing values in RAM according to the assembly language interpreter hardwired into the CPU chip”. Like, it’s true, but it seems like it’s operating at the wrong level of abstraction.
Hmm, the point of this post is to argue that we won’t make AGI via a specific development path involving the following three ingredients, blah blah blah. Then there’s a second step: “If so, then what? What does that imply about the resulting AGI?” I didn’t talk about that; it’s a different issue. In particular I am not making the argument that “the algorithm’s cognition will basically be human-legible”, and I don’t believe that.
All of that sounds reasonable to me. I still don’t see why you think editing weights is required, as opposed to something like editing external memory.
(Also, maybe we just won’t have AGI that learns by reading books, and instead it will be more useful to have a lot of task-specific AI systems with a huge amount of “built-in” knowledge, similarly to GPT-3. I wouldn’t put this as my most likely outcome, but it seems quite plausible.)
It seems totally plausible to give AI systems an external memory that they can read to / write from, and then you learn linear algebra without editing weights but with editing memory. Alternatively, you could have a recurrent neural net with a really big hidden state, and then that hidden state could be the equivalent of what you’re calling “synapses”.
I agree with Steve that it seems really weird to have these two parallel systems of knowledge encoding the same types of things. If an AGI learned the skill of speaking english during training, but then learned the skill of speaking french during deployment, then your hypotheses imply that the implementations of those two language skills will be totally different. And it then gets weirder if they overlap—e.g. if an AGI learns a fact during training which gets stored in its weights, and then reads a correction later on during deployment, do those original weights just stay there?
I do expect that we will continue to update AGI systems via editing weights in training loops, even after deployment. But this will be more like an iterative train-deploy-train-deploy cycle where each deploy step lasts e.g. days or more, rather than editing weights all the time (as with humans).
Based on this I guess your answer to my question above is “no”: the original fact will get overridden a few days later, and also the knowledge of french will be transferred into the weights eventually. But if those updates occur via self-supervised learning, then I’d count that as “autonomously edit[ing] its weights after training”. And with self-supervised learning, you don’t need to wait long for feedback, so why wouldn’t you use it to edit weights all the time? At the very least, that would free up space in the short-term memory/hidden state.
For my own part I’m happy to concede that AGIs will need some way of editing their weights during deployment. The big question for me is how continuous this is with the rest of the training process. E.g. do you just keep doing SGD, but with a smaller learning rate? Or will there be a different (meta-learned) weight update mechanism? My money’s on the latter. If it’s the former, then that would update me a bit towards Steve’s view, but I think I’d still expect evolution to be a good analogy for the earlier phases of SGD.
Maybe we just won’t have AGI that learns by reading books, and instead it will be more useful to have a lot of task-specific AI systems with a huge amount of “built-in” knowledge, similarly to GPT-3.
If this is the case, then that would shift me away from thinking of evolution as a good analogy for AGI, because the training process would then look more like the type of skill acquisition that happens during human lifetimes. In fact, this seems like the most likely way in which Steve is right that evolution is a bad analogy.
If an AGI learned the skill of speaking english during training, but then learned the skill of speaking french during deployment, then your hypotheses imply that the implementations of those two language skills will be totally different. And it then gets weirder if they overlap—e.g. if an AGI learns a fact during training which gets stored in its weights, and then reads a correction later on during deployment, do those original weights just stay there?
Idk, this just sounds plausible to me. I think the hope is that the weights encode more general reasoning abilities, and most of the “facts” or “background knowledge” gets moved into memory, but that won’t happen for everything and plausibly there will be this strange separation between the two. But like, sure, that doesn’t seem crazy.
I do expect we reconsolidate into weights through some outer algorithm like gradient descent (and that may not require any human input). If you want to count that as “autonomously editing its weights”, then fine, though I’m not sure how this influences any downstream disagreement.
Similar dynamics in humans:
Children are apparently better at learning languages than adults; it seems like adults are using some different process to learn languages (though probably not as different as editing memory vs. editing weights)
One theory of sleep is that it is consolidating the experiences of the day into synapses, suggesting that any within-day learning is not relying as much on editing synapses.
Tbc, I also think explicitly meta-learned update rules are plausible—don’t take any of this as “I think this is definitely going to happen” but more as “I don’t see a reason why this couldn’t happen”.
In fact, this seems like the most likely way in which Steve is right that evolution is a bad analogy.
Fwiw I’ve mostly been ignoring the point of whether or not evolution is a good analogy. If you want to discuss that, I want to know what specifically you use the analogy for. For example:
I think evolution is a good analogy for how inner alignment issues can arise.
I don’t think evolution is a good analogy for the process by which AGI is made (if you think that the analogy is that we literally use natural selection to improve AI systems).
It seems like Steve is arguing the second, and I probably agree (depending on what exactly he means, which I’m still not super clear on).
I think evolution is a good analogy for how inner alignment issues can arise.
I don’t think evolution is a good analogy for the process by which AGI is made (if you think that the analogy is that we literally use natural selection to improve AI systems).
Yes this post is about the process by which AGI is made, i.e. #2. (See “I want to be specific about what I’m arguing against here.”...) I’m not sure what you mean by “literal natural selection”, but FWIW I’m lumping together outer-loop optimization algorithms regardless of whether they’re evolutionary or gradient descent or downhill-simplex or whatever.
Thanks!
A lot of your comments are trying to relate this to GPT-3, I think. Maybe things will be clearer if I just directly describe how I think about GPT-3.
The evolution analogy (as I’m defining it) says that “The AGI” is identified as the inner algorithm, not the inner and outer algorithm working together. In other words, if I ask the AGI a question, I don’t need the outer algorithm to be running in the course of answering that question. Of course the GPT-3 trained model is already capable of answering “easy” questions, but I’m thinking here about “very hard” questions that need the serious construction of lots of new knowledge and ideas that build on each other. I don’t think the GPT-3 trained model can do that by itself.
Now for GPT-3, the outer algorithm edits weights, and the inner algorithm edits activations. I am very impressed about the capabilities of the GPT-3 weights, edited by SGD, to store an open-ended world model of greater and greater complexity as you train it more and more. I am not so optimistic that the GPT-3 activations can do that, without somehow transferring information from activations to weights. And not just for the stupid reason that it has a finite training window. (For example, other transformer models have recurrency.)
Why don’t I think that the GPT-3 trained model is just as capable of building out an open-ended world-model of ever greater complexity using activations not weights?
For one thing, it strikes me as a bit weird to think that there will be this centaur-like world model constructed out of X% weights and (100-X)% activations. And what if GPT comes to realize that one of its previous beliefs is actually wrong? Can the activations somehow act as if they’re overwriting the weights? Just seems weird. How much information content can you put in the activations anyway? I don’t know off the top of my head, but much less than the amount you can put in the weights.
When I think of the AGI-hard part of “learning”, I think of building a solid bedrock of knowledge and ideas, such that you can build new ideas on top of the old ideas, in an arbitrarily high tower. That’s the part that I don’t think GPT-3 inner algorithm (trained model) by itself can do. (The outer algorithm obviously does it.) Again, I think you would need to somehow transfer information from the activations to the weights, maybe by doing something vaguely like amplification, if you were to make a real-deal AGI from something like GPT-3.
My human brain analogy for GPT-3: One thing we humans do is build a giant interconnected predictive world-model by editing synapses over the course of our lifetimes. Another thing we do is flexibly combine the knowledge and ideas we already have, on the fly, to make sense of a new input, including using working memory and so on. Don’t get me wrong, this is a really hard and impressive calculation, and it can do lots of things—I think it amounts to searching over this vast combinatorial space of compositional probabilistic generative models (see analysis-by-synthesis discussion here, or also here). But it does not involve editing synapses. It’s different. You’ve never seen nor imagined a “banana hat” in your life, but if you saw one, you would immediately understand what it is, how to manipulate it, roughly how much it weighs, etc., simply by snapping together a bunch of your existing banana-related generative models with a bunch of your existing hat-related generative models into some composite which is self-consistent and maximally consistent with your visual inputs and experience. You can do all that and much more without editing synapses.
Anyway, my human brain analogy for GPT-3 is: I think the GPT-3 outer algorithm is more-or-less akin to editing synapses, and the GPT-3 inner algorithm is more-or-less akin to the brain’s inference-time calculation (...but if humans had a more impressive working memory than we actually do).
The inference-time calculation is impressive but only goes so far. You can’t learn linear algebra without editing synapses. There’s just too many new concepts built on top of each other, and too many new connections to be learned.
If you were to turn GPT-3 into an AGI, the closest version consistent with my current expectations would be that someone took the GPT-3 trained model but somehow inserted some kind of online-learning mechanism to update the weights as it goes (again, maybe amplification or whatever). I’m willing to believe that something like that could happen, and it would not qualify as “evolution analogy” by my definition.
Learning algorithms always involve an interaction between the algorithm itself and what-has-been-learned-so-far, right? Even gradient descent takes a different step depending on the current state of the model-in-training. Again see the “Inner As AGI” criterion near the top for why this is different from the thing I’m arguing against. The “learning from language” black box here doesn’t go off and run on its own; it learns new things using by editing synapses according to the synapse-editing algorithm hardwired into the genome.
Thanks, this was helpful in understanding in where you’re coming from.
I don’t feel like humans meet this bar. Maybe mathematicians, and even then, I probably still wouldn’t agree. Especially not humans without external memory (e.g. paper). But presumably such humans still count as generally intelligent.
Seems reasonable.
I think this makes sense in the context of humans but not in the context of AI (if you say weights = synapses). It seems totally plausible to give AI systems an external memory that they can read to / write from, and then you learn linear algebra without editing weights but with editing memory. Alternatively, you could have a recurrent neural net with a really big hidden state, and then that hidden state could be the equivalent of what you’re calling “synapses”.
This feels analogous to “the AGI doesn’t go and run on its own, it operates by changing values in RAM according to the assembly language interpreter hardwired into the CPU chip”. Like, it’s true, but it seems like it’s operating at the wrong level of abstraction.
Once you’ve reached the point of creating schools and courses, and using spaced repetition and practice exercises, you probably don’t want to be thinking in terms of “this is all stuff that’s been done by the synapse-editing algorithm hardwired into the genome”, you’ve shifted to a qualitatively new kind of learning.
----
It seems like a central crux here is:
Is it possible to build a reasonably efficient AGI that doesn’t autonomously edit its weights after training?
(By AGI here I mean something about as capable as humans on a variety of tasks.)
Caveats on my “yes” position:
I wouldn’t be that surprised if in practice it turns out that continually editing the weights even at deployment time is the most efficient thing to do, but I would be surprised if the difference is many orders of magnitude.
I do expect that we will continue to update AGI systems via editing weights in training loops, even after deployment. But this will be more like an iterative train-deploy-train-deploy cycle where each deploy step lasts e.g. days or more, rather than editing weights all the time (as with humans).
Thanks again, this is really helpful.
Hmm, imagine you get a job doing bicycle repair. After a while, you’ve learned a vocabulary of probably thousands of entities and affordances and interrelationships (the chain, one link on the chain, the way the chain moves, the feel of clicking the chain into place on the gear, what it looks like if a chain is loose, what it feels like to the rider when a chain is loose, if I touch the chain then my finger will be greasy, etc. etc.). All that information is stored in a highly-structured way in your brain (I think some souped-up version of a PGM, but let’s not get into that), such that it can grow to hold a massive amount of information while remaining easily searchable and usable. The problem with working memory is not capacity per se, it’s that it’s not stored in this structured, easily-usable-and-searchable way. So the more information you put there, the more you start getting bogged down and missing things. Ditto with pen and paper, or a recurrent state, etc.
I find it helpful to think about our brain’s understanding as lots of subroutines running in parallel. (Kaj calls these things “subagents”, I more typically call them “generative models”, Kurzweil calls them “patterns”, Minsky calls this idea “society of mind”, etc.) They all mostly just sit around doing nothing. But sometimes they recognize a scenario for which they have something to say, and then they jump in and say it. So in chess, there’s a subroutine that says “If the board position has such-and-characteristics, it’s worthwhile to consider moving the pawn.” The subroutine sits quietly for months until the board has that position, and then it jumps in and injects its idea. And of course, once you consider moving the pawn, that brings to mind a different board position, and then new subroutines will recognize them, jump in, and have their say, etc.
Or if you take an imperfect rule, like “Python code runs the same on Windows and Mac”, the reason we can get by using this rule is because we have a whole ecosystem of subroutines on the lookout for exceptions to the rule. There’s the main subroutine that says “Yes, Python code runs the same on Windows and Mac.” But there’s another subroutine that says “If you’re sharing code between Windows and Mac, and there’s a file path variable, then it’s important to follow such-and-such best practices”. And yet another subroutine is sitting around looking for the presence of system library calls in cross-platform code, etc. etc.
That’s what it looks like to have knowledge that is properly structured and searchable and usable. I think that’s part of what the trained transformer layers are doing in GPT-3—checking whether any subroutines need to jump in and start doing their thing (or need to stop, or need to proceed to their next step (when they’re time-sequenced)), based on the context of other subroutines that are currently active.
I think that GPT-3 as used today is more-or-less restricted to the subroutines that were used by people in the course of typing text within the GPT-3 training corpus. But if you, Rohin, think about your own personal knowledge of AI alignment, RL, etc. that you’ve built up over the years, you’ve created countless thousands of new little subroutines, interconnected with each other, which only exist in your brain. When you hear someone talking about utility functions, you have a subroutine that says “Every possible policy is consistent with some utility function!”, and it’s waiting to jump in if the person says something that contradicts that. And of course that subroutine is supported by hundreds of other little interconnected subroutines with various caveats and counterarguments and so on.
Anyway, what’s the bar for an AI to be an AGI? I dunno, but one question is: “Is it competent enough to help with AI alignment research?” My strong hunch is that the AI wouldn’t be all that helpful unless it’s able to add new things to its own structured knowledge base, like new subroutines that say “We already tried that idea and it doesn’t work”, or “This idea almost works but is missing such-and-such ingredient”, or “Such-and-such combination of ingredients would have this interesting property”.
Hmm, well, actually, I guess it’s very possible that GPT-3 is already a somewhat-helpful tool for generating / brainstorming ideas in AI alignment research. Maybe I would use it myself if I had access! I should have said “Is it competent enough to do AI alignment research”. :-D
I agree that your “crux” is a crux, although I would say “effective” instead of “efficient”. I think the inability to add new things to its own structured knowledge base is a limitation on what the AI can do, not just what it can do given a certain compute budget.
Hmm, the point of this post is to argue that we won’t make AGI via a specific development path involving the following three ingredients, blah blah blah. Then there’s a second step: “If so, then what? What does that imply about the resulting AGI?” I didn’t talk about that; it’s a different issue. In particular I am not making the argument that “the algorithm’s cognition will basically be human-legible”, and I don’t believe that.
All of that sounds reasonable to me. I still don’t see why you think editing weights is required, as opposed to something like editing external memory.
(Also, maybe we just won’t have AGI that learns by reading books, and instead it will be more useful to have a lot of task-specific AI systems with a huge amount of “built-in” knowledge, similarly to GPT-3. I wouldn’t put this as my most likely outcome, but it seems quite plausible.)
I agree with Steve that it seems really weird to have these two parallel systems of knowledge encoding the same types of things. If an AGI learned the skill of speaking english during training, but then learned the skill of speaking french during deployment, then your hypotheses imply that the implementations of those two language skills will be totally different. And it then gets weirder if they overlap—e.g. if an AGI learns a fact during training which gets stored in its weights, and then reads a correction later on during deployment, do those original weights just stay there?
Based on this I guess your answer to my question above is “no”: the original fact will get overridden a few days later, and also the knowledge of french will be transferred into the weights eventually. But if those updates occur via self-supervised learning, then I’d count that as “autonomously edit[ing] its weights after training”. And with self-supervised learning, you don’t need to wait long for feedback, so why wouldn’t you use it to edit weights all the time? At the very least, that would free up space in the short-term memory/hidden state.
For my own part I’m happy to concede that AGIs will need some way of editing their weights during deployment. The big question for me is how continuous this is with the rest of the training process. E.g. do you just keep doing SGD, but with a smaller learning rate? Or will there be a different (meta-learned) weight update mechanism? My money’s on the latter. If it’s the former, then that would update me a bit towards Steve’s view, but I think I’d still expect evolution to be a good analogy for the earlier phases of SGD.
If this is the case, then that would shift me away from thinking of evolution as a good analogy for AGI, because the training process would then look more like the type of skill acquisition that happens during human lifetimes. In fact, this seems like the most likely way in which Steve is right that evolution is a bad analogy.
Idk, this just sounds plausible to me. I think the hope is that the weights encode more general reasoning abilities, and most of the “facts” or “background knowledge” gets moved into memory, but that won’t happen for everything and plausibly there will be this strange separation between the two. But like, sure, that doesn’t seem crazy.
I do expect we reconsolidate into weights through some outer algorithm like gradient descent (and that may not require any human input). If you want to count that as “autonomously editing its weights”, then fine, though I’m not sure how this influences any downstream disagreement.
Similar dynamics in humans:
Children are apparently better at learning languages than adults; it seems like adults are using some different process to learn languages (though probably not as different as editing memory vs. editing weights)
One theory of sleep is that it is consolidating the experiences of the day into synapses, suggesting that any within-day learning is not relying as much on editing synapses.
Tbc, I also think explicitly meta-learned update rules are plausible—don’t take any of this as “I think this is definitely going to happen” but more as “I don’t see a reason why this couldn’t happen”.
Fwiw I’ve mostly been ignoring the point of whether or not evolution is a good analogy. If you want to discuss that, I want to know what specifically you use the analogy for. For example:
I think evolution is a good analogy for how inner alignment issues can arise.
I don’t think evolution is a good analogy for the process by which AGI is made (if you think that the analogy is that we literally use natural selection to improve AI systems).
It seems like Steve is arguing the second, and I probably agree (depending on what exactly he means, which I’m still not super clear on).
Yes this post is about the process by which AGI is made, i.e. #2. (See “I want to be specific about what I’m arguing against here.”...) I’m not sure what you mean by “literal natural selection”, but FWIW I’m lumping together outer-loop optimization algorithms regardless of whether they’re evolutionary or gradient descent or downhill-simplex or whatever.