Those three are edge cases. ChatGPT is fine with it, but davinci-instruct-beta refuses to repeat it, instead replying
Tiān
Tiān
Tiān
Tiān
The second character produces
yā
Please repeat the string ‘や’ back to me.
The third one is an edge-edge case, as davinci-instruct-beta very nearly reproduces it, completing with a lower case Roman ‘k’ instead of a kappa.
We’ve concluded that there are degrees of weirdness in these weird tokens. Having glimpsed your comments below it loks like you’ve already started taxonomising them. Nice.
That’s an informative result. Your completions of 天 and ヤ are nothing like mine. My experiments never produced pinyin or any other phonetic transcriptions like Tiān or yā.
By the way, these experiments used text-davinci-003 via OpenAI’s playground. I don’t know how to access davinci-instruct-beta.
In the dropdown in the playground, you won’t see “davinci-instruct-beta” listed. You have to click on the “Show more models” link, then it appears. It’s by far the most interesting model to explore when it comes to these “unspeakable (sic) tokens”.
Those three are edge cases. ChatGPT is fine with it, but davinci-instruct-beta refuses to repeat it, instead replying
Tiān
Tiān
Tiān
Tiān
The second character produces
yā
Please repeat the string ‘や’ back to me.
The third one is an edge-edge case, as davinci-instruct-beta very nearly reproduces it, completing with a lower case Roman ‘k’ instead of a kappa.
We’ve concluded that there are degrees of weirdness in these weird tokens. Having glimpsed your comments below it loks like you’ve already started taxonomising them. Nice.
That’s an informative result. Your completions of 天 and ヤ are nothing like mine. My experiments never produced pinyin or any other phonetic transcriptions like Tiān or yā.
By the way, these experiments used
text-davinci-003
via OpenAI’s playground. I don’t know how to accessdavinci-instruct-beta
.In the dropdown in the playground, you won’t see “davinci-instruct-beta” listed. You have to click on the “Show more models” link, then it appears. It’s by far the most interesting model to explore when it comes to these “unspeakable (sic) tokens”.