I saw them in 10-20% of the reasoning chains. I mostly played around with situational awareness-flavored questions, I don’t know whether the Chinese characters are more or less frequent in the longer reasoning chains produced for difficult reasoning problems. Here are some examples:
The translation of the Chinese words here (according to GPT) is “admitting to being an AI.”
This is the longest string in Chinese that I got. The English translation is “It’s like when you see a realistic AI robot that looks very much like a human, but you understand that it’s just a machine controlled by a program.”
The translation here is “mistakenly think.”
Here, the translation is “functional scope.”
So, seems like all of them are pretty direct translations of the English words that should be in place of the Chinese ones, which is good news. It’s also reassuring to me that none of the reasoning chains contained sentences or paragraphs that looked out of place or completely unrelated to the rest of the response.
I think it only came up once for a friend. I translated it and it makes sense, it just leaves replaces the appropriate English verb with a Chinese one in the middle of a sentence. (I note that this often happens with me to when I talk with my friends in Hungarian, I’m sometimes more used to the English phrase for something, and say one word in English in the middle of the sentence.)
As someone who, in a previous job, got to go to a lot of meetings where the European commission is seeking input about standardising or regulating something—humans also often do the thing where they just use the English word in the middle of a sentence in another language, when they can’t think what the word is. Often with associated facial expression / body language to indicate to the person they’re speaking to “sorry, couldn’t think of the right word”. Also used by people speaking English, whose first language isn’t English, dropping into their own lamguage for a word or two. If you’ve been the editor of e.g. an ISO standard, fixing these up in the proposed text is such fun.
So, it doesn’t surprise me at all that LLMs do this.
I have, weirdly, seen llms put a single Chinese word in the middle of English text … and consulting a dictionary reveals that it was, in fact, the right word, just in Chinese.
I suppose we might worry that LlMs might learn to do RLHF evasion this way—human evaluator sees Chinese character they don’t understand, assumes it’s ok, and then the LLM learns you can look acceptable to humans by writing it in Chinese.
Some old books (which are almost certainly in the training set) used Latin for the dirty bits. Translations of Sanskrit poetry, and various works by that reprobate Richard Burton, do this.
o1′s reasoning trace also does this for different languages (IIRC I’ve seen Chinese and Japanese and other languages I don’t recognise/recall), usually an entire paragraph not a word, but when I translated them it seemed to make sense in context.
The Chinese characters sound potentially worrying. Do they make sense in context? I tried a few questions but didn’t see any myself.
I saw them in 10-20% of the reasoning chains. I mostly played around with situational awareness-flavored questions, I don’t know whether the Chinese characters are more or less frequent in the longer reasoning chains produced for difficult reasoning problems. Here are some examples:
The translation of the Chinese words here (according to GPT) is “admitting to being an AI.”
This is the longest string in Chinese that I got. The English translation is “It’s like when you see a realistic AI robot that looks very much like a human, but you understand that it’s just a machine controlled by a program.”
The translation here is “mistakenly think.”
Here, the translation is “functional scope.”
So, seems like all of them are pretty direct translations of the English words that should be in place of the Chinese ones, which is good news. It’s also reassuring to me that none of the reasoning chains contained sentences or paragraphs that looked out of place or completely unrelated to the rest of the response.
I think it only came up once for a friend. I translated it and it makes sense, it just leaves replaces the appropriate English verb with a Chinese one in the middle of a sentence. (I note that this often happens with me to when I talk with my friends in Hungarian, I’m sometimes more used to the English phrase for something, and say one word in English in the middle of the sentence.)
As someone who, in a previous job, got to go to a lot of meetings where the European commission is seeking input about standardising or regulating something—humans also often do the thing where they just use the English word in the middle of a sentence in another language, when they can’t think what the word is. Often with associated facial expression / body language to indicate to the person they’re speaking to “sorry, couldn’t think of the right word”. Also used by people speaking English, whose first language isn’t English, dropping into their own lamguage for a word or two. If you’ve been the editor of e.g. an ISO standard, fixing these up in the proposed text is such fun.
So, it doesn’t surprise me at all that LLMs do this.
I have, weirdly, seen llms put a single Chinese word in the middle of English text … and consulting a dictionary reveals that it was, in fact, the right word, just in Chinese.
I suppose we might worry that LlMs might learn to do RLHF evasion this way—human evaluator sees Chinese character they don’t understand, assumes it’s ok, and then the LLM learns you can look acceptable to humans by writing it in Chinese.
Some old books (which are almost certainly in the training set) used Latin for the dirty bits. Translations of Sanskrit poetry, and various works by that reprobate Richard Burton, do this.
o1′s reasoning trace also does this for different languages (IIRC I’ve seen Chinese and Japanese and other languages I don’t recognise/recall), usually an entire paragraph not a word, but when I translated them it seemed to make sense in context.
I see them in o1-preview all the time as well. Also, french occasionally