As described above, I expect AGI to be a learning algorithm—for example, it should be able to read a book and then have a better understanding of the subject matter. Every learning algorithm you’ve ever heard of—ConvNets, PPO, TD learning, etc. etc.—was directly invented, understood, and programmed by humans. None of them were discovered by an automated search over a space of algorithms. Thus we get a presumption that AGI will also be directly invented, understood, and programmed by humans.
For a post criticizing the use of evolution for end to end ML, this post seems to be pretty strawmanish and generally devoid of any grappling with the Bitter Lesson, end-to-end principle, Clune’s arguments for generativity and AI-GAs program to soup up self-play for goal generation/curriculum learning, or any actual research on evolving better optimizers, DRL, or SGD itself… Where’s Schmidhuber, Metz, or AutoML-Zero? Are we really going to dismiss PBT evolving populations of agents in the AlphaLeague just ‘tweaking a few human-legible hyperparameters’? Why isn’t Co-Reyes et al 2021 an example of evolutionary search inventing TD-learning which you claim is absurd and the sort of thing that has never happened?
My current thinking is: (1) Outer-loop meta-learning is slow, (2) Therefore we shouldn’t expect to get all that many bits of information out of it, (3) Therefore it’s a great way to search for parameter settings in a parameterized family of algorithms, but not a great way to do “the bulk of the real design work”, in the sense that programmers can look at the final artifact and say “Man, I have no idea what this algorithm is doing and why it’s learning anything at all, let alone why it’s learning things very effectively”.
Like if I look at a trained ConvNet, it’s telling me: Hey Steve, take your input pixels, multiply them by this specific giant matrix of numbers, then add this vector, blah blah , and OK now you have a vector, and if the first entry of the vector is much bigger than the other entries, then you’ve got a picture of a tench. I say “Yeah, that is a picture of a tench, but WTF just happened?” (Unless I’m Chris Olah.) That’s what I think of when I think of the outer loop doing “the bulk of the real design work”.
By contrast, when I look at Co-Reyes, I see a search for parameter settings (well, a tree of operations) within a parametrized family of primarily-human-designed algorithms—just what I expected. If I wanted to run the authors’ best and final RL algorithm, I would start by writing probably many thousands of lines of human-written code, all of which come from human knowledge of how RL algorithms should generally work (”...the policy is obtained from the Q-value function using an ε-greedy strategy. The agent saves this stream of transitions...to a replay buffer and continually updates the policy by minimizing a loss function...over these transitions with gradient descent...”). Then, to that big pile of code, I would add one important missing ingredient—the loss function L—containing at most 104 bits of information (if I calculated right). This ingredient is indeed designed by an automated search, but it doesn’t have a lot of inscrutable complexity—the authors have no trouble writing down L and explaining intuitively why it’s a sensible choice. Anyway, this is a very different kind of thing than the tench-discovery algorithm above.
Did the Co-Reyes search “invent” TD learning? Well, they searched over a narrow (<2104-element) parameterized family of algorithms that included TD learning in it, and one of their searches settled on TD learning as a good option. Consider how few algorithms is 2104 algorithms out of the space of all possible algorithms. Isn’t it shocking that TD learning was even an option? No, it’s not shocking, it’s deliberate. The authors already knew that TD learning was good, and when they set up their search space, they made sure that TD learning would be part of it. (“Our search language...should be expressive enough to represent existing algorithms...”). I don’t find anything about that surprising!
I feel like maybe I was projecting a mood of “Outer-loop searches aren’t impressive or important”. I don’t think that! As far as I know, we might be just a few more outer-loop searches away from AGI! (I’m doubtful, but that’s a different story. Anyway it’s certainly possible.) And I did in fact write that I expect this kind of thing to be probably part of the path to AGI. It’s all great stuff, and I didn’t write this blog post because I wanted to belittle it. I wrote the blog post to respond to the idea I’ve heard that, for example, we could plausibly wind up with an AGI algorithm that’s fundamentally based on reinforcement learning with tree search, but we humans are totally oblivious to the fact that the algorithm is based on reinforcement learning with tree search, because it’s an opaque black box generating its own endogenous reward signals and doing RL off that, and we just have no idea about any of this. It takes an awful lot of bits to build that inscrutable a black box, and I don’t think outer-loop meta-learning can feasibly provide that many bits of design complexity, so far as I know. (Again I’m not an expert and I’m open to learning.)
any grappling with the Bitter Lesson
I’m not exactly sure what you think I’m saying that’s contrary to Bitter Lesson. My reading of “Bitter lesson” is that it’s a bad idea to write code that describes the object-level complexity of the world, like “tires are black” or “the queen is a valuable chess piece”, but rather we should write learning algorithms that learn the object-level complexity of the world from data. I don’t read “Bitter Lesson” as saying that humans should stop trying to write learning algorithms. Every positive example in Bitter Lesson is a human-written learning algorithm.
Take something like “Attention Is All You Need” (2017). I think of it as a success story, exactly the kind of research that moves forward the field of AI. But it’s an example of humans inventing a better learning algorithm. Do you think that “Attention Is All You Need” not part of the path to AGI, but rather a step forward in the wrong direction? Is “Attention Is All You Need” the modern version of “yet another paper with a better handcrafted chess-position-evaluation algorithm”? If that’s what you think, well, you can make that argument, but I don’t think that argument is “The Bitter Lesson”, at least not in any straightforward reading of “Bitter Lesson”, AFAICT...
It would also be a pretty unusual view, right? Most people think that the invention of transformers is what AI progress looks like, right? (Not that there’s anything wrong with unusual views, I’m just probing to make sure I correctly understand the ML consensus.)
I personally found this post valuable and thought-provoking. Sure, there’s plenty that it doesn’t cover, but it’s already pretty long, so that seems perfectly reasonable.
I particularly I dislike your criticism of it as strawmanish. Perhaps that would be fair if the analogy between RL and evolution were a standard principle in ML. Instead, it’s a vague idea that is often left implicit, or else formulated in idiosyncratic ways. So posts like this one have to do double duty in both outlining and explaining the mainstream viewpoint (often a major task in its own right!) and then criticising it. This is most important precisely in the cases where the defenders of an implicit paradigm don’t have solid articulations of it, making it particularly difficult to understand what they’re actually defending. I think this is such a case.
If you disagree, I’d be curious what you consider a non-strawmanish summary of the RL-evolution analogy. Perhaps Clune’s AI-GA paper? But from what I can tell opinions of it are rather mixed, and the AI-GA terminology hasn’t caught on.
Just wanted to say that this comment made me add a lot of things on my reading list, so thanks for that (but I’m clearly not well-read enough to go into the discussion).
For a post criticizing the use of evolution for end to end ML, this post seems to be pretty strawmanish and generally devoid of any grappling with the Bitter Lesson, end-to-end principle, Clune’s arguments for generativity and AI-GAs program to soup up self-play for goal generation/curriculum learning, or any actual research on evolving better optimizers, DRL, or SGD itself… Where’s Schmidhuber, Metz, or AutoML-Zero? Are we really going to dismiss PBT evolving populations of agents in the AlphaLeague just ‘tweaking a few human-legible hyperparameters’? Why isn’t Co-Reyes et al 2021 an example of evolutionary search inventing TD-learning which you claim is absurd and the sort of thing that has never happened?
Thanks for all those great references!
My current thinking is: (1) Outer-loop meta-learning is slow, (2) Therefore we shouldn’t expect to get all that many bits of information out of it, (3) Therefore it’s a great way to search for parameter settings in a parameterized family of algorithms, but not a great way to do “the bulk of the real design work”, in the sense that programmers can look at the final artifact and say “Man, I have no idea what this algorithm is doing and why it’s learning anything at all, let alone why it’s learning things very effectively”.
Like if I look at a trained ConvNet, it’s telling me: Hey Steve, take your input pixels, multiply them by this specific giant matrix of numbers, then add this vector, blah blah , and OK now you have a vector, and if the first entry of the vector is much bigger than the other entries, then you’ve got a picture of a tench. I say “Yeah, that is a picture of a tench, but WTF just happened?” (Unless I’m Chris Olah.) That’s what I think of when I think of the outer loop doing “the bulk of the real design work”.
By contrast, when I look at Co-Reyes, I see a search for parameter settings (well, a tree of operations) within a parametrized family of primarily-human-designed algorithms—just what I expected. If I wanted to run the authors’ best and final RL algorithm, I would start by writing probably many thousands of lines of human-written code, all of which come from human knowledge of how RL algorithms should generally work (”...the policy is obtained from the Q-value function using an ε-greedy strategy. The agent saves this stream of transitions...to a replay buffer and continually updates the policy by minimizing a loss function...over these transitions with gradient descent...”). Then, to that big pile of code, I would add one important missing ingredient—the loss function L—containing at most 104 bits of information (if I calculated right). This ingredient is indeed designed by an automated search, but it doesn’t have a lot of inscrutable complexity—the authors have no trouble writing down L and explaining intuitively why it’s a sensible choice. Anyway, this is a very different kind of thing than the tench-discovery algorithm above.
Did the Co-Reyes search “invent” TD learning? Well, they searched over a narrow (<2104-element) parameterized family of algorithms that included TD learning in it, and one of their searches settled on TD learning as a good option. Consider how few algorithms is 2104 algorithms out of the space of all possible algorithms. Isn’t it shocking that TD learning was even an option? No, it’s not shocking, it’s deliberate. The authors already knew that TD learning was good, and when they set up their search space, they made sure that TD learning would be part of it. (“Our search language...should be expressive enough to represent existing algorithms...”). I don’t find anything about that surprising!
I feel like maybe I was projecting a mood of “Outer-loop searches aren’t impressive or important”. I don’t think that! As far as I know, we might be just a few more outer-loop searches away from AGI! (I’m doubtful, but that’s a different story. Anyway it’s certainly possible.) And I did in fact write that I expect this kind of thing to be probably part of the path to AGI. It’s all great stuff, and I didn’t write this blog post because I wanted to belittle it. I wrote the blog post to respond to the idea I’ve heard that, for example, we could plausibly wind up with an AGI algorithm that’s fundamentally based on reinforcement learning with tree search, but we humans are totally oblivious to the fact that the algorithm is based on reinforcement learning with tree search, because it’s an opaque black box generating its own endogenous reward signals and doing RL off that, and we just have no idea about any of this. It takes an awful lot of bits to build that inscrutable a black box, and I don’t think outer-loop meta-learning can feasibly provide that many bits of design complexity, so far as I know. (Again I’m not an expert and I’m open to learning.)
I’m not exactly sure what you think I’m saying that’s contrary to Bitter Lesson. My reading of “Bitter lesson” is that it’s a bad idea to write code that describes the object-level complexity of the world, like “tires are black” or “the queen is a valuable chess piece”, but rather we should write learning algorithms that learn the object-level complexity of the world from data. I don’t read “Bitter Lesson” as saying that humans should stop trying to write learning algorithms. Every positive example in Bitter Lesson is a human-written learning algorithm.
Take something like “Attention Is All You Need” (2017). I think of it as a success story, exactly the kind of research that moves forward the field of AI. But it’s an example of humans inventing a better learning algorithm. Do you think that “Attention Is All You Need” not part of the path to AGI, but rather a step forward in the wrong direction? Is “Attention Is All You Need” the modern version of “yet another paper with a better handcrafted chess-position-evaluation algorithm”? If that’s what you think, well, you can make that argument, but I don’t think that argument is “The Bitter Lesson”, at least not in any straightforward reading of “Bitter Lesson”, AFAICT...
It would also be a pretty unusual view, right? Most people think that the invention of transformers is what AI progress looks like, right? (Not that there’s anything wrong with unusual views, I’m just probing to make sure I correctly understand the ML consensus.)
I personally found this post valuable and thought-provoking. Sure, there’s plenty that it doesn’t cover, but it’s already pretty long, so that seems perfectly reasonable.
I particularly I dislike your criticism of it as strawmanish. Perhaps that would be fair if the analogy between RL and evolution were a standard principle in ML. Instead, it’s a vague idea that is often left implicit, or else formulated in idiosyncratic ways. So posts like this one have to do double duty in both outlining and explaining the mainstream viewpoint (often a major task in its own right!) and then criticising it. This is most important precisely in the cases where the defenders of an implicit paradigm don’t have solid articulations of it, making it particularly difficult to understand what they’re actually defending. I think this is such a case.
If you disagree, I’d be curious what you consider a non-strawmanish summary of the RL-evolution analogy. Perhaps Clune’s AI-GA paper? But from what I can tell opinions of it are rather mixed, and the AI-GA terminology hasn’t caught on.
Just wanted to say that this comment made me add a lot of things on my reading list, so thanks for that (but I’m clearly not well-read enough to go into the discussion).
Further reading: https://www.reddit.com/r/reinforcementlearning/search/?q=flair%3AMetaRL&include_over_18=on&restrict_sr=on&sort=new https://www.gwern.net/Backstop#external-links