Interesting. Can you point to a study by external researchers (not DeepL) that compares DeepL to other systems (such as Google Translate) quantitatively? After a quick search, I could only find one paper, which was just testing some tricky idioms in Spanish and didn’t find significant differences between DeepL and Google. (Wikipedia links to this archived page of comparisons conducted by DeepL but there’s no information about the methodology used and the difference in performance seem too big to be credible to me.)
The primary source of my quality assessment is my personal experience with both Google Translate and DeepL. I speak 3 languages, and often have to translate between them (2 of them are not my native languages, including English).
As I understand, making such comparisons in a quantitative manner is tricky, as there are no standardized metrics, there are many dimensions of translation quality, and the quality strongly depends on the language pair and the input text.
Google Scholar lists a bunch of papers that compare Google Translate and DeepL. I checked a few, and they’re all over the place. For example, one claims that Google is better, another claims that they score the same, and yet another claims that DeepL is better.
My tentative conclusion: by quantitative metrics, DeepL is in the same league as Google Translate, and might be better by some metrics. Which is still an impressive achievement by DeepL, considering the fact that they have orders-of-magnitude less data, compute, and researchers than Google.
My tentative conclusion: by quantitative metrics, DeepL is in the same league as Google Translate, and might be better by some metrics. Which is still an impressive achievement by DeepL, considering the fact that they have orders-of-magnitude less data, compute, and researchers than Google.
Do they though? Google is a large company, certainty, but they might not actually give Google Translate researchers a lot of funding. Google gets revenue from translation by offering it as a cloud service, but I found this thread from 2018 where someone said,
Google Translate and Cloud Translation API are two different products doing some similar functions. It is safe to assume they have different algorithms and differences in translations are not only common but expected.
From this, it appears that there is little incentive for Google to improve the algorithms on Google Translate.
We could try to guesstimate how much money Google spends on the R&D for Google Translate.
Judging by these data, Google Translate has 500 million daily users, or 11% of the total Internet population worldwide.
Not sure how much it costs Google to run a service of such a scale, but I would guesstimate that it’s in the order of ~$1 bln / year.
If they spend 90% of the total Translate budget on keeping the service online, and 10% on the R&D, they have ~$100 mln / year on the R&D, which is likely much more than the total income of DeepL.
It is unlikely that Google is spending ~$100 mln / year on something without a very good reason.
One of the possible reasons is training data. Judging by the same source, Google Translate generates 100 billion words / day. It means, Google gets about the same massive amount of new user-generated training data per day.
The amount is truly massive: thanks to Google Translate, Google gets the Library-of-Congress worth of new data per year, multiplied by 10.
And some of the new data is hard to get by other means (e.g. users trying to translate their personal messages from a low-resource language like Basque or Azerbaijani).
I think your estimate for the operating costs (mostly compute) of running a service that has 500 million daily users (with probably well under 10 requests per user on average, and pretty short average input) is too high by 4 orders of magnitude, maybe even 5. My main uncertainty is how exactly they’re servicing each request (i.e. what % of requests are novel enough to actually need to get run though some more-expensive ML model instead of just returning a cached result).
5 billion requests per day translates to ~58k requests per second. You might have trouble standing up a service that handles that level of traffic on a 10k/year compute budget if you buy compute on a public cloud (i.e. AWS/GCP) but those have pretty fat margins so the direct cost to Google themselves would be a lot lower.
The story of DeepL might be of relevance.
DeepL has created a translator that is noticeably better than Google Translate. Their translations are often near-flawless.
The interesting thing is: DeepL is a small company that has OOMs less compute, data, and researchers than Google.
Their small team has beaten Google (!), at the Google’s own game (!), by the means of algorithmic innovation.
Interesting. Can you point to a study by external researchers (not DeepL) that compares DeepL to other systems (such as Google Translate) quantitatively? After a quick search, I could only find one paper, which was just testing some tricky idioms in Spanish and didn’t find significant differences between DeepL and Google. (Wikipedia links to this archived page of comparisons conducted by DeepL but there’s no information about the methodology used and the difference in performance seem too big to be credible to me.)
The primary source of my quality assessment is my personal experience with both Google Translate and DeepL. I speak 3 languages, and often have to translate between them (2 of them are not my native languages, including English).
As I understand, making such comparisons in a quantitative manner is tricky, as there are no standardized metrics, there are many dimensions of translation quality, and the quality strongly depends on the language pair and the input text.
Google Scholar lists a bunch of papers that compare Google Translate and DeepL. I checked a few, and they’re all over the place. For example, one claims that Google is better, another claims that they score the same, and yet another claims that DeepL is better.
My tentative conclusion: by quantitative metrics, DeepL is in the same league as Google Translate, and might be better by some metrics. Which is still an impressive achievement by DeepL, considering the fact that they have orders-of-magnitude less data, compute, and researchers than Google.
Do they though? Google is a large company, certainty, but they might not actually give Google Translate researchers a lot of funding. Google gets revenue from translation by offering it as a cloud service, but I found this thread from 2018 where someone said,
From this, it appears that there is little incentive for Google to improve the algorithms on Google Translate.
We could try to guesstimate how much money Google spends on the R&D for Google Translate.
Judging by these data, Google Translate has 500 million daily users, or 11% of the total Internet population worldwide.
Not sure how much it costs Google to run a service of such a scale, but I would guesstimate that it’s in the order of ~$1 bln / year.
If they spend 90% of the total Translate budget on keeping the service online, and 10% on the R&D, they have ~$100 mln / year on the R&D, which is likely much more than the total income of DeepL.
It is unlikely that Google is spending ~$100 mln / year on something without a very good reason.
One of the possible reasons is training data. Judging by the same source, Google Translate generates 100 billion words / day. It means, Google gets about the same massive amount of new user-generated training data per day.
The amount is truly massive: thanks to Google Translate, Google gets the Library-of-Congress worth of new data per year, multiplied by 10.
And some of the new data is hard to get by other means (e.g. users trying to translate their personal messages from a low-resource language like Basque or Azerbaijani).
I think your estimate for the operating costs (mostly compute) of running a service that has 500 million daily users (with probably well under 10 requests per user on average, and pretty short average input) is too high by 4 orders of magnitude, maybe even 5. My main uncertainty is how exactly they’re servicing each request (i.e. what % of requests are novel enough to actually need to get run though some more-expensive ML model instead of just returning a cached result).
5 billion requests per day translates to ~58k requests per second. You might have trouble standing up a service that handles that level of traffic on a 10k/year compute budget if you buy compute on a public cloud (i.e. AWS/GCP) but those have pretty fat margins so the direct cost to Google themselves would be a lot lower.