I don’t really know a lot about performance metrics for language models. Is there a good reason for believing that LAMBADA scores should be comparable for different languages?
I don’t really know a lot about performance metrics for language models. Is there a good reason for believing that LAMBADA scores should be comparable for different languages?