A language model is not a helpful person who is trying to answer your question.
I would have been much more interested in taking an ordinary predictive language model (e.g. LLaMA 65B), and just getting it to continue some Voynichese.
Everything that makes Bing chat a good chatbot—the finetuning, the pre-prompt telling it that what follows is a chat log written by a helpful chatbot, the separator tokens to tell it who’s speaking—makes it worse at doing straightforward continuations of the Voynich manuscript.
Giving it a straightforward directive to continue the text was a decent idea, and you could have tried to be even more minimalist—you could have tried just entering some Voynichese and seeing how it responded, or asking it to continue some text (that was Voynichese) with no trimmings of context about what it is to give it preconceptions. Do things in new sessions so that previous messages aren’t setting the tone for every later message—the instant it gave you a non-answer about the Voynich manuscript, that tells all future messages in the same session that the chatbot side of the conversation is supposed to be giving non-answers.
Asking it how it’s doing something isn’t going to give accurate results, because it’s not a helpful person who knows how they did things, it’s just predicting what text someone helpful would say. Note that if it’s good enough at predicting helpful text, its ideas for how to do things might still be good ones. But you have to be aware of what you’re getting.
I tried with GPT-neox, just giving it some transcribed voynichese,[1] but it didn’t do so great—partially because the context window was too short for it to even learn the alphabet, but also maybe because it’s not that great at speaking in tongues, and this kind of deciphering task is actually not what LLMs are good at.
and user MarcoP already noticed that Bing AI’s “Voynichese” doesn’t follow VMS statistics in one obvious respect: “The continuation includes 56 tokens: in actual Voynichese, an average of 7 of these would be unique word-types that don’t appear elsewhere” whereas “The [Bing AI] continuation is entirely made up of words from Takahashi’s transliteration.” So, no wonder all of the “vords” in the AI’s continuation seemed to pass the “sniff test” as valid Voynich vords if Bing AI only used existing Voynich vords! That’s one easy way to make sure that you only use valid vords without needing to have a clue about what makes a Voynichese vord valid or how to construct a new valid Voynichese vord. So my initial optimism that Bing AI understood something deep about Voynichese is probably mistaken.
That said, would it be possible to train a new LLM in a more targeted way just on English (so that we can interact with it) and on Voynichese so that Voynichese would be a more salient part of its training corpus? Is there enough Voynichese (~170,000 characters, or 38,000 “vords”) to get somewhere with that with current LLMs?
A language model is not a helpful person who is trying to answer your question.
I would have been much more interested in taking an ordinary predictive language model (e.g. LLaMA 65B), and just getting it to continue some Voynichese.
Everything that makes Bing chat a good chatbot—the finetuning, the pre-prompt telling it that what follows is a chat log written by a helpful chatbot, the separator tokens to tell it who’s speaking—makes it worse at doing straightforward continuations of the Voynich manuscript.
Giving it a straightforward directive to continue the text was a decent idea, and you could have tried to be even more minimalist—you could have tried just entering some Voynichese and seeing how it responded, or asking it to continue some text (that was Voynichese) with no trimmings of context about what it is to give it preconceptions. Do things in new sessions so that previous messages aren’t setting the tone for every later message—the instant it gave you a non-answer about the Voynich manuscript, that tells all future messages in the same session that the chatbot side of the conversation is supposed to be giving non-answers.
Asking it how it’s doing something isn’t going to give accurate results, because it’s not a helpful person who knows how they did things, it’s just predicting what text someone helpful would say. Note that if it’s good enough at predicting helpful text, its ideas for how to do things might still be good ones. But you have to be aware of what you’re getting.
I tried with GPT-neox, just giving it some transcribed voynichese,[1] but it didn’t do so great—partially because the context window was too short for it to even learn the alphabet, but also maybe because it’s not that great at speaking in tongues, and this kind of deciphering task is actually not what LLMs are good at.
fachys.ykal.ar.ataiin.shol.shory.cthres.y.kor.sholdy
sory.ckhar.or.y.kair.chtaiin.shar.are.cthar.cthar.dan
syaiir.sheky.or.ykaiin.shod.cthoary.cthes.daraiin.sa
ooiin.oteey.oteos.roloty.cthar.daiin.otaiin.or.okan
dair.y.chear.cthaiin.cphar.cfhaiin
ydaraishy
odar.o.y.shol.cphoy.oydar.sh.s.cfhoaiin.shodary
yshey.shody.okchoy.otchol.chocthy.oschy.dain.chor.kos
I’m glad others are trying this out. I crossposted this over on the Voynich Ninja forum:
https://www.voynich.ninja/thread-3977.html
and user MarcoP already noticed that Bing AI’s “Voynichese” doesn’t follow VMS statistics in one obvious respect: “The continuation includes 56 tokens: in actual Voynichese, an average of 7 of these would be unique word-types that don’t appear elsewhere” whereas “The [Bing AI] continuation is entirely made up of words from Takahashi’s transliteration.” So, no wonder all of the “vords” in the AI’s continuation seemed to pass the “sniff test” as valid Voynich vords if Bing AI only used existing Voynich vords! That’s one easy way to make sure that you only use valid vords without needing to have a clue about what makes a Voynichese vord valid or how to construct a new valid Voynichese vord. So my initial optimism that Bing AI understood something deep about Voynichese is probably mistaken.
That said, would it be possible to train a new LLM in a more targeted way just on English (so that we can interact with it) and on Voynichese so that Voynichese would be a more salient part of its training corpus? Is there enough Voynichese (~170,000 characters, or 38,000 “vords”) to get somewhere with that with current LLMs?