Playing this game made me realize that humans aren’t trainged to predict at the token-level. I don’t know the token-level vocabulary; and made lots of mistakes by missing spaces and punctuation. Is it possible to convert the token-level prediction in to word-level prediction? This may get you a better picture of human ability.
One way to convert: measure how accurate the LM is at word-level prediction by measuring its likelihood of each possible word. For example the LM’s likelihood of the word “[token A][token B]” could be p(token A|context)∗p(token B|token A, context).
Playing this game made me realize that humans aren’t trainged to predict at the token-level. I don’t know the token-level vocabulary; and made lots of mistakes by missing spaces and punctuation. Is it possible to convert the token-level prediction in to word-level prediction? This may get you a better picture of human ability.
One way to convert: measure how accurate the LM is at word-level prediction by measuring its likelihood of each possible word. For example the LM’s likelihood of the word “[token A][token B]” could be p(token A|context)∗p(token B|token A, context).