We could try to guesstimate how much money Google spends on the R&D for Google Translate.
Judging by these data, Google Translate has 500 million daily users, or 11% of the total Internet population worldwide.
Not sure how much it costs Google to run a service of such a scale, but I would guesstimate that it’s in the order of ~$1 bln / year.
If they spend 90% of the total Translate budget on keeping the service online, and 10% on the R&D, they have ~$100 mln / year on the R&D, which is likely much more than the total income of DeepL.
It is unlikely that Google is spending ~$100 mln / year on something without a very good reason.
One of the possible reasons is training data. Judging by the same source, Google Translate generates 100 billion words / day. It means, Google gets about the same massive amount of new user-generated training data per day.
The amount is truly massive: thanks to Google Translate, Google gets the Library-of-Congress worth of new data per year, multiplied by 10.
And some of the new data is hard to get by other means (e.g. users trying to translate their personal messages from a low-resource language like Basque or Azerbaijani).
I think your estimate for the operating costs (mostly compute) of running a service that has 500 million daily users (with probably well under 10 requests per user on average, and pretty short average input) is too high by 4 orders of magnitude, maybe even 5. My main uncertainty is how exactly they’re servicing each request (i.e. what % of requests are novel enough to actually need to get run though some more-expensive ML model instead of just returning a cached result).
5 billion requests per day translates to ~58k requests per second. You might have trouble standing up a service that handles that level of traffic on a 10k/year compute budget if you buy compute on a public cloud (i.e. AWS/GCP) but those have pretty fat margins so the direct cost to Google themselves would be a lot lower.
We could try to guesstimate how much money Google spends on the R&D for Google Translate.
Judging by these data, Google Translate has 500 million daily users, or 11% of the total Internet population worldwide.
Not sure how much it costs Google to run a service of such a scale, but I would guesstimate that it’s in the order of ~$1 bln / year.
If they spend 90% of the total Translate budget on keeping the service online, and 10% on the R&D, they have ~$100 mln / year on the R&D, which is likely much more than the total income of DeepL.
It is unlikely that Google is spending ~$100 mln / year on something without a very good reason.
One of the possible reasons is training data. Judging by the same source, Google Translate generates 100 billion words / day. It means, Google gets about the same massive amount of new user-generated training data per day.
The amount is truly massive: thanks to Google Translate, Google gets the Library-of-Congress worth of new data per year, multiplied by 10.
And some of the new data is hard to get by other means (e.g. users trying to translate their personal messages from a low-resource language like Basque or Azerbaijani).
I think your estimate for the operating costs (mostly compute) of running a service that has 500 million daily users (with probably well under 10 requests per user on average, and pretty short average input) is too high by 4 orders of magnitude, maybe even 5. My main uncertainty is how exactly they’re servicing each request (i.e. what % of requests are novel enough to actually need to get run though some more-expensive ML model instead of just returning a cached result).
5 billion requests per day translates to ~58k requests per second. You might have trouble standing up a service that handles that level of traffic on a 10k/year compute budget if you buy compute on a public cloud (i.e. AWS/GCP) but those have pretty fat margins so the direct cost to Google themselves would be a lot lower.