axioman comments on Beijing Academy of Artificial Intelligence announces 1,75 trillion parameters model, Wu Dao 2.0

axioman 4 Jun 2021 20:24 UTC
1 point
If I understood correctly, the model was trained in Chinese and probably quite expensive to train.
Do you know whether these Chinese models usually get “translated” to English, or whether there is a “fair” way of comparing models that were (mainly) trained on different languages (I’d imagine that even the tokenization might be quite different for Chinese)?
- A Ray 6 Jun 2021 2:21 UTC
  1 point
  Parent
  In my experience, I haven’t seen a good “translation” process—instead models are pretrained on bigger and bigger corpora which include more languages.
  
  GPT-3 was trained on data that was mostly english, but also is able to (AFAICT) generate other languages as well.
  
  For some english-dependent metrics (SuperGLUE, Winogrande, LAMBADA, etc) I expect a model trained on primarily non-english corpora would do worse.
  
  Also, yes, the tokenization I would expect to be different for a largely different corpora.