ESRogs comments on “Unsupervised” translation as an (intent) alignment problem

ESRogs 30 Sep 2020 22:01 UTC
2 points
A (perhaps obvious) idea that this brings to mind—you could use GPT to produce endless training data for translation:
1. Train GPT on both English and Klingon.
2. Select some English text that you have the seed of a Klingon translation for. (Where the seed might be just a few tokens, or it might be many paragraphs. In any case it’s meant to be a prefix of a complete translation.)
3. Use your translator model (which is maybe another head on the same GPT-ish network) to translate from English to Klingon, given only the part of the English text beyond what was translated for the seed.
4. Feed the seed translation to GPT and ask it how surprised it is if that text is followed by the output of your translator.
5. Use that surprisal as a loss function for the translator.
In other words, split some English text into a prefix and suffix. We assume by induction that we already have a translation for the prefix. Train a translator model to translate the suffix into something that GPT (trained on Klingon) finds unsurprising. (You could also try training the translator to exactly match whatever continuation GPT finds most likely, but my guess is that comparing the token by token difference between the outputs of the translator and GPT is a less good training signal than GPT’s surprisal at the translator’s output.)
Assuming GPT-trained-on-English and GPT-trained-on-Klingon are both good, it seems to me (in my uninformed layman’s opinion) that you could likely bootstrap to good translations using this procedure, even w/ no parallel text, and w/ a translator that starts out by outputting gibberish.
Perhaps this is in some sense equivalent (or at least analogous) to the techniques used in the linked Lample et al paper, but I thought it was interesting to think about in this way, with two parallel GPTs and a translator learning to translate between them.
EDIT: This is not meant to answer Paul’s question of how you could train a system to explain itself. Just a thought that the post brought to mind.
EDIT2: I suppose you could also train the translator on its surprisal at GPT’s output. (Or maybe I shouldn’t think of it as output and surprisal—since the output is sampled from a probability distribution, maybe you just compare the two distributions? I suppose this comes down to the details of the two networks and the exact format of their outputs.)