There are some problems with this analysis. First of all, translation is natural language processing. What task requires more understanding of natural language than translation? Second, the BLEU score mentioned is only a cheap and imperfect measure of translation quality. The best measure is human evaluation. And neural machine translation excels at that. Look at this graph. On all languages, the neural network is closer to human performance than the previous method. And on several languages it’s extremely close to human performance, and it’s translations would almost be indistinguishable from human. That’s incredible! And it shows that NNs can handle symbolic problems, which the author disputes.
The biggest problem though, is that all machine learning tasks are expected to have diminishing returns.You can’t do better on a classification task than 0% error, for example. You might have an algorithm that is 10x better than another. But it may only do 1% better on some chosen dataset. Because there isn’t more than 1% progress that can be made! Just looking at benchmark scores is going to really underestimate the rate of progress.
There are some problems with this analysis. First of all, translation is natural language processing. What task requires more understanding of natural language than translation? Second, the BLEU score mentioned is only a cheap and imperfect measure of translation quality. The best measure is human evaluation. And neural machine translation excels at that. Look at this graph. On all languages, the neural network is closer to human performance than the previous method. And on several languages it’s extremely close to human performance, and it’s translations would almost be indistinguishable from human. That’s incredible! And it shows that NNs can handle symbolic problems, which the author disputes.
The biggest problem though, is that all machine learning tasks are expected to have diminishing returns.You can’t do better on a classification task than 0% error, for example. You might have an algorithm that is 10x better than another. But it may only do 1% better on some chosen dataset. Because there isn’t more than 1% progress that can be made! Just looking at benchmark scores is going to really underestimate the rate of progress.