Veedrac comments on PaLM in “Extrapolating GPT-N performance”

Veedrac 13 Apr 2022 2:40 UTC

1 point

I am not 100% sure I did that right but that seems like a more sensible answer.

Eyy, I should trust myself more. Verified on Pile-CC.

(GPT-2/3 BPE)
>>> k = 100000000; k / len(tokenizer(cc[:k])["input_ids"])
4.355680325470372

(T5 sentencepiece)
>>> k = 10000000; k / len(tokenizer(cc[:k])["input_ids"])
4.182535904979476