I think there’s a common habit of conflating “symbolic” with “not brute force” and “not bitter lesson” but that it’s not right. For example, if I were to write an algorithm that takes a ton of unstructured data and goes off and builds a giant PGM that best explains all that data, I would call that a “symbolic” AI algorithm (because PGMs are kinda discrete / symbolic / etc.), but I would also call it a “statistical” AI algorithm, and I would certainly call it “compatible with The Bitter Lesson”.
(Incidentally, this description is pretty close to my oversimplified caricature description of what the neocortex does.)
(I’m not disputing that “symbolic” and “containing lots of handcrafted domain-specific structure” do often go together in practice today—e.g. Josh Tenenbaum’s papers tend to have both and OpenAI papers tend to have neither—I’m just saying they don’t necessarily go together.)
I don’t have a concrete suggestion of what if anything you should change here, I’m just chatting. :-)
Cognitive-science approach
This is all fine except that I kinda don’t like the term “cognitive science” for what you’re talking about. Maybe it’s just me, but anyway here’s where I’m coming from:
Learning algorithms almost inevitably have the property that the trained models are more complex than the learning rules that create them. For example, compare the code to run gradient descent and train a ConvNet (it’s not very complicated) to the resulting image-classification algorithms as explored in the OpenAI microscope project (they’re much more complicated).
I bring this up because “cognitive science” to me has a connotation of “the study of how human brains do all the things that human brains do”, especially adult human brains. After all, that’s the main thing that most cognitive scientists study AFAICT. So if you think that human intelligence is “mostly a trained model”, then you would think that most cognitive science is “mostly akin to OpenAI microscope” as opposed to “mostly akin to PyTorch”, and therefore mostly unnecessary and unhelpful for building HLMI. You don’t have to think that—certainly Gary Marcus & Steven Pinker don’t—but I do (to a significant extent) and at least a few other prominent neuroscientists do too (e.g. Randall O’Reilly). (See “learning-from-scratch-ism” section here, also cortical uniformity here.) So as much as I buy into (what you call) “the cognitive-science approach”, I’m not just crazy about that term, and for my part I prefer to talk about “brain algorithms” or “high-level brain algorithms”. I think “brain algorithms” is more agnostic about the nature of the algorithms, and in particular whether it’s the kinds of algorithms that neuroscientists talk about, versus the kinds of algorithms that computational cognitive scientists & psychologists talk about.
I agree that symbolic doesn’t have to mean not bitter lesson-y (though in practice I think there are often effects in that direction). I might even go a bit further than you here and claim that a system with a significant amount of handcrafted aspects might still be bitter lesson-y, under the right conditions. The bitter lesson doesn’t claim that the maximally naive and brute-force method possible will win, but instead that, among competing methods, more computationally-scalable methods will generally win over time (as compute increases). This shouldn’t be surprising, as if methods A and B were both appealing enough to receive attention to begin with, then as compute increases drastically, we’d expect the method of the two that was more compute-leveraging to pull ahead. This doesn’t mean that a different method C, which was more naive/brute-force than either A or B, but wasn’t remotely competitive with A and B to begin with, would also pull ahead. Also, insofar as people are hardcoding in things that do scale well with compute (maybe certain types of biases, for instance), that may be more compatible with the bitter lesson than, say, hardcoding in domain knowledge.
Part of me also wonders what happens to the bitter lesson if compute really levels off. In such a world, the future gains from leveraging further compute don’t seem as appealing, and it’s possible larger gains can be had elsewhere.
This is great! Here’s a couple random thoughts:
I think there’s a common habit of conflating “symbolic” with “not brute force” and “not bitter lesson” but that it’s not right. For example, if I were to write an algorithm that takes a ton of unstructured data and goes off and builds a giant PGM that best explains all that data, I would call that a “symbolic” AI algorithm (because PGMs are kinda discrete / symbolic / etc.), but I would also call it a “statistical” AI algorithm, and I would certainly call it “compatible with The Bitter Lesson”.
(Incidentally, this description is pretty close to my oversimplified caricature description of what the neocortex does.)
(I’m not disputing that “symbolic” and “containing lots of handcrafted domain-specific structure” do often go together in practice today—e.g. Josh Tenenbaum’s papers tend to have both and OpenAI papers tend to have neither—I’m just saying they don’t necessarily go together.)
I don’t have a concrete suggestion of what if anything you should change here, I’m just chatting. :-)
This is all fine except that I kinda don’t like the term “cognitive science” for what you’re talking about. Maybe it’s just me, but anyway here’s where I’m coming from:
Learning algorithms almost inevitably have the property that the trained models are more complex than the learning rules that create them. For example, compare the code to run gradient descent and train a ConvNet (it’s not very complicated) to the resulting image-classification algorithms as explored in the OpenAI microscope project (they’re much more complicated).
I bring this up because “cognitive science” to me has a connotation of “the study of how human brains do all the things that human brains do”, especially adult human brains. After all, that’s the main thing that most cognitive scientists study AFAICT. So if you think that human intelligence is “mostly a trained model”, then you would think that most cognitive science is “mostly akin to OpenAI microscope” as opposed to “mostly akin to PyTorch”, and therefore mostly unnecessary and unhelpful for building HLMI. You don’t have to think that—certainly Gary Marcus & Steven Pinker don’t—but I do (to a significant extent) and at least a few other prominent neuroscientists do too (e.g. Randall O’Reilly). (See “learning-from-scratch-ism” section here, also cortical uniformity here.) So as much as I buy into (what you call) “the cognitive-science approach”, I’m not just crazy about that term, and for my part I prefer to talk about “brain algorithms” or “high-level brain algorithms”. I think “brain algorithms” is more agnostic about the nature of the algorithms, and in particular whether it’s the kinds of algorithms that neuroscientists talk about, versus the kinds of algorithms that computational cognitive scientists & psychologists talk about.
Thanks!
I agree that symbolic doesn’t have to mean not bitter lesson-y (though in practice I think there are often effects in that direction). I might even go a bit further than you here and claim that a system with a significant amount of handcrafted aspects might still be bitter lesson-y, under the right conditions. The bitter lesson doesn’t claim that the maximally naive and brute-force method possible will win, but instead that, among competing methods, more computationally-scalable methods will generally win over time (as compute increases). This shouldn’t be surprising, as if methods A and B were both appealing enough to receive attention to begin with, then as compute increases drastically, we’d expect the method of the two that was more compute-leveraging to pull ahead. This doesn’t mean that a different method C, which was more naive/brute-force than either A or B, but wasn’t remotely competitive with A and B to begin with, would also pull ahead. Also, insofar as people are hardcoding in things that do scale well with compute (maybe certain types of biases, for instance), that may be more compatible with the bitter lesson than, say, hardcoding in domain knowledge.
Part of me also wonders what happens to the bitter lesson if compute really levels off. In such a world, the future gains from leveraging further compute don’t seem as appealing, and it’s possible larger gains can be had elsewhere.