I think Gary Marcus and his crowd are largely ridiculous in their criticism of GPT-2. Nobody coming at the situation with open-eyes would deny that this is such a massive leap forward in language models that it makes everything else ever tried look embarrassing. The goalposts have moved so far that the players on the field can’t see them without a good telescope.
However, I do think their criticism does highlight some interesting properties of these systems. They’re right that you can pose reasoning problems to Transform-based language models that these problems really struggle with (it seems to have a great deal of difficulty counting characters). The architecture also scales poorly to long samples of text because of the GIGO death spiral problem. However, properly trained on a toy dataset, it can do multi-step logical reasoning with a high degree of accuracy (although, it’s worth noting, not infinitely long). So it’s certainly not entirely incapable of reproducing symbolic reasoning, but it has several major deficits in this respect.
If anything, Transformer-based language models reminds me of some accounts of people with serious but localized brain damage: people who don’t have intact mental faculties, but can still speak relatively coherently. I think maybe the best model for the situation is that TBLM architectures are *capable of* logic, but not *well suited to* logic. If you train these things only on logical problems, you can force them to learn to model logic, to some extent, but it’s fundamentally kind of an awkward fit. TBLMs are great at sequence problems and just okay at hierarchical reasoning. You can kind of see this in the deficits that remain in the stupidly large 10+ billion parameter models various big tech companies have been training, and their steeply diminishing returns. Some problems don’t get solved with just a very big transformer.
It may be that there’s some other “brain region” / network architecture needed in order to extend these systems to perform well on text-based problem-solving in general. But if that’s where the goal-posts are now, we certainly live in exciting times.
I think Gary Marcus and his crowd are largely ridiculous in their criticism of GPT-2. Nobody coming at the situation with open-eyes would deny that this is such a massive leap forward in language models that it makes everything else ever tried look embarrassing. The goalposts have moved so far that the players on the field can’t see them without a good telescope.
However, I do think their criticism does highlight some interesting properties of these systems. They’re right that you can pose reasoning problems to Transform-based language models that these problems really struggle with (it seems to have a great deal of difficulty counting characters). The architecture also scales poorly to long samples of text because of the GIGO death spiral problem. However, properly trained on a toy dataset, it can do multi-step logical reasoning with a high degree of accuracy (although, it’s worth noting, not infinitely long). So it’s certainly not entirely incapable of reproducing symbolic reasoning, but it has several major deficits in this respect.
If anything, Transformer-based language models reminds me of some accounts of people with serious but localized brain damage: people who don’t have intact mental faculties, but can still speak relatively coherently. I think maybe the best model for the situation is that TBLM architectures are *capable of* logic, but not *well suited to* logic. If you train these things only on logical problems, you can force them to learn to model logic, to some extent, but it’s fundamentally kind of an awkward fit. TBLMs are great at sequence problems and just okay at hierarchical reasoning. You can kind of see this in the deficits that remain in the stupidly large 10+ billion parameter models various big tech companies have been training, and their steeply diminishing returns. Some problems don’t get solved with just a very big transformer.
It may be that there’s some other “brain region” / network architecture needed in order to extend these systems to perform well on text-based problem-solving in general. But if that’s where the goal-posts are now, we certainly live in exciting times.