I am not an expert in ML but based on some conversations I was following, I heard WuDao’s LAMBADA score (an important performance measure for Language Models) is significantly lower than GPT-3. I guess a number of parameters isn’t everything.
I don’t really know a lot about performance metrics for language models. Is there a good reason for believing that LAMBADA scores should be comparable for different languages?
I am not an expert in ML but based on some conversations I was following, I heard WuDao’s LAMBADA score (an important performance measure for Language Models) is significantly lower than GPT-3. I guess a number of parameters isn’t everything.
I don’t really know a lot about performance metrics for language models. Is there a good reason for believing that LAMBADA scores should be comparable for different languages?