I agree that symbolic doesn’t have to mean not bitter lesson-y (though in practice I think there are often effects in that direction). I might even go a bit further than you here and claim that a system with a significant amount of handcrafted aspects might still be bitter lesson-y, under the right conditions. The bitter lesson doesn’t claim that the maximally naive and brute-force method possible will win, but instead that, among competing methods, more computationally-scalable methods will generally win over time (as compute increases). This shouldn’t be surprising, as if methods A and B were both appealing enough to receive attention to begin with, then as compute increases drastically, we’d expect the method of the two that was more compute-leveraging to pull ahead. This doesn’t mean that a different method C, which was more naive/brute-force than either A or B, but wasn’t remotely competitive with A and B to begin with, would also pull ahead. Also, insofar as people are hardcoding in things that do scale well with compute (maybe certain types of biases, for instance), that may be more compatible with the bitter lesson than, say, hardcoding in domain knowledge.
Part of me also wonders what happens to the bitter lesson if compute really levels off. In such a world, the future gains from leveraging further compute don’t seem as appealing, and it’s possible larger gains can be had elsewhere.
Thanks!
I agree that symbolic doesn’t have to mean not bitter lesson-y (though in practice I think there are often effects in that direction). I might even go a bit further than you here and claim that a system with a significant amount of handcrafted aspects might still be bitter lesson-y, under the right conditions. The bitter lesson doesn’t claim that the maximally naive and brute-force method possible will win, but instead that, among competing methods, more computationally-scalable methods will generally win over time (as compute increases). This shouldn’t be surprising, as if methods A and B were both appealing enough to receive attention to begin with, then as compute increases drastically, we’d expect the method of the two that was more compute-leveraging to pull ahead. This doesn’t mean that a different method C, which was more naive/brute-force than either A or B, but wasn’t remotely competitive with A and B to begin with, would also pull ahead. Also, insofar as people are hardcoding in things that do scale well with compute (maybe certain types of biases, for instance), that may be more compatible with the bitter lesson than, say, hardcoding in domain knowledge.
Part of me also wonders what happens to the bitter lesson if compute really levels off. In such a world, the future gains from leveraging further compute don’t seem as appealing, and it’s possible larger gains can be had elsewhere.