In my experience, I haven’t seen a good “translation” process—instead models are pretrained on bigger and bigger corpora which include more languages.GPT-3 was trained on data that was mostly english, but also is able to (AFAICT) generate other languages as well.For some english-dependent metrics (SuperGLUE, Winogrande, LAMBADA, etc) I expect a model trained on primarily non-english corpora would do worse.Also, yes, the tokenization I would expect to be different for a largely different corpora.
In my experience, I haven’t seen a good “translation” process—instead models are pretrained on bigger and bigger corpora which include more languages.
GPT-3 was trained on data that was mostly english, but also is able to (AFAICT) generate other languages as well.
For some english-dependent metrics (SuperGLUE, Winogrande, LAMBADA, etc) I expect a model trained on primarily non-english corpora would do worse.
Also, yes, the tokenization I would expect to be different for a largely different corpora.