My view is that progress probably switched from being net positive to net negative (in expectation) sometime around GPT-3.
We fully agree on this, and so it seems like we don’t have large disagreements on externalities of progress. From our point of view, the cutoff point was probably GPT-2 rather than 3, or some similar event that established the current paradigm as the dominant one.
Regarding the rest of your comment and your other comment here, here are some reasons why we disagree. It’s mostly high level, as it would take a lot of detailed discussion into models of scientific and technological progress, which we might cover in some future posts.
In general, we think you’re treating the current paradigm as over-determined. We don’t think that being in a DL-scaling language model large single generalist system-paradigm is a necessary trajectory of progress, rather than a historical contingency.
While the Bitter Lesson might be true and a powerful driver for the ease of working on singleton, generalist large monolithic systems over smaller, specialized ones, science doesn’t always (some might say very rarely!) follow the most optimal path.
There are many possible paradigms that we could be in, and the current one is among the worse ones for safety. For instance, we could be in a symbolic paradigm, or a paradigm that focuses on factoring problems and using smaller LSTMs to solve them. Of course, there do exist worse paradigms, such as a pure RL non-language based singleton paradigm.
In any case, we think the trajectory of the field got determined once GPT-2 and 3 brought scaling into the limelight, and if those didn’t happen or memetics went another way, we could be in a very very different world.
We fully agree on this, and so it seems like we don’t have large disagreements on externalities of progress. From our point of view, the cutoff point was probably GPT-2 rather than 3, or some similar event that established the current paradigm as the dominant one.
Regarding the rest of your comment and your other comment here, here are some reasons why we disagree. It’s mostly high level, as it would take a lot of detailed discussion into models of scientific and technological progress, which we might cover in some future posts.
In general, we think you’re treating the current paradigm as over-determined. We don’t think that being in a DL-scaling language model large single generalist system-paradigm is a necessary trajectory of progress, rather than a historical contingency.
While the Bitter Lesson might be true and a powerful driver for the ease of working on singleton, generalist large monolithic systems over smaller, specialized ones, science doesn’t always (some might say very rarely!) follow the most optimal path.
There are many possible paradigms that we could be in, and the current one is among the worse ones for safety. For instance, we could be in a symbolic paradigm, or a paradigm that focuses on factoring problems and using smaller LSTMs to solve them. Of course, there do exist worse paradigms, such as a pure RL non-language based singleton paradigm.
In any case, we think the trajectory of the field got determined once GPT-2 and 3 brought scaling into the limelight, and if those didn’t happen or memetics went another way, we could be in a very very different world.