alexlyzhov comments on Where is human level on text prediction? (GPTs task)

alexlyzhov 20 Sep 2020 18:49 UTC
2 points
I agree that the difference in datasets between 1BW and PTB is making precise comparisons impossible. Also, the “human perplexity = 12” on 1BW is not measured directly. It’s extrapolated from their constructed “human judgement score” metric based on values of both “human judgement score” and perplexity metrics for pre-2017 language models, with authors noting that the extrapolation is unreliable.