Regarding the prompts problem: Is there a way to reverse a language model so it predicts preceding tokens, rather than subsequent ones? Then you could feed it the results from a standard subsequent-tokens-predicting LM, and ask it to predict which prompt generated it. (Probably something like this is already being done, and I just don’t know the term for it.)
(Technically, I suppose one could also train a forward-looking LM on a dataset with reversed strings, then feed it prompts of reversed strings, to make it predict preceding tokens. So I guess the remaining question is whether one can get the same behavior without retraining the LM.)
(Technically, I suppose one could also train a forward-looking LM on a dataset with reversed strings, then feed it prompts of reversed strings, to make it predict preceding tokens. So I guess the remaining question is whether one can get the same behavior without retraining the LM.)
Yes, that works. This is how models like CogView can go both ways with the reverse caption trick, akin to projectors in GANs for reversing images to latents (see also Decision Transformer). Since it’s just a sequence of tokens concatenated, the model and training is indifferent to whether you trained it with the text caption tokens first and then image tokens (to generate an image based on text) or image tokens and then text captions (to train a captioner). Apparently it costs very little training to finetune the reverse direction, the model already has most or all of the necessary knowledge (which makes sense) and just needs to rejigger its input handling. It also gives you a way to self-critique image generation: if you train CogView with a reverse captioning version, instead of calling out to CLIP to gauge quality of (text,generated-image) pairs, you can simply run the generated-image’s tokens through the reverse CogView and calculate the likelihood of the text caption token by token & sum—if the total likelihood of the text seems strange based on the generated images, then it wasn’t a good generated image.
No one’s done this with text models that I know of. Probably everyone is too keen on doing complicated prompt-finetuning methods of various kinds to do gradient ascent on the model in various ways to optimize the inputs and weights/biases to bother with trying reversing. I’d predict that trying to infer the necessary prompt with the reversing trick wouldn’t work for small models anyhow, and would be a waste of time compared to directly editing/controlling the model.
I’d predict that trying to infer the necessary prompt with the reversing trick wouldn’t work for small models anyhow, and would be a waste of time compared to directly editing/controlling the model.
Also, even if one had a reversed model available, it would not be trivial to generate useful prompts with it.
The goal is (roughly) to find a prompt that maximizes P(correct_answer|prompt). But the reversed model gives you
The answer is a constant so we can ignore it, but P(prompt) is problematic: we don’t care about the likelihood of the prompt, since we plan to condition on it.
Moreover, we need a way to communicate what the prompt is supposed to mean, and a single answer isn’t a sufficient latent for that. (Consider ”...2002? Barack Obama”→“who was the Illinois State senator for the 13th district in the year...”)
Prompt-finetuning resolves the ambiguity by averaging over multiple answers, which could work here, but would require an unusual sampling technique (average likelihood over multiple prompts?).
My intuition is that it would Just Work for large smart models, which are easy to prompt and few-shot well: like the ‘French->English’ prompt, if you reversed that and you kept getting ‘French->English’ as your prompt across multiple examples, well, that’s obviously a good prompt, whatever the objective likelihood of text starting with ‘French->English’ may be. And to the extent that it didn’t work, it would either be relatively ‘obvious’ what the reversed outputs have in common so you could try to come up with a generic prompt (“who was the X State senator for the nth district in the year MMMM?”), or it would at least be a good starting point for the gradient ascent-type procedures (in the same way that with GAN projectors you often treat it as a hybrid: a quick projection of amortized computation into roughly the right latent z and then gradient ascent to clean it up) and you get a bunch of reversed prompts as seeds for optimization to find the best one or maximize diversity for some other reason.
There is no known way to “reverse” an LM like that.
(Well, besides the brute force method, where you generate a preceding token by looping over all possible values for that token. GPT’s vocab has ~50k tokens, so this is ~50k slower than forwared sampling.)
There are some LMs that naturally work in both directions. Namely, masked language models (eg BERT), as opposed to causal language models (eg GPT). Rather than taking a substring as input, masked language models take a complete string, but with some positions randomly masked or corrupted, and it’s trained to undo these changes.
However, these models are mostly used for things other than text generation; it’s possible to make them write text, but the resulting text tends to be lower-quality than what you can get from a comparably sized GPT.
Thanks for posting this!
Regarding the prompts problem: Is there a way to reverse a language model so it predicts preceding tokens, rather than subsequent ones? Then you could feed it the results from a standard subsequent-tokens-predicting LM, and ask it to predict which prompt generated it. (Probably something like this is already being done, and I just don’t know the term for it.)
(Technically, I suppose one could also train a forward-looking LM on a dataset with reversed strings, then feed it prompts of reversed strings, to make it predict preceding tokens. So I guess the remaining question is whether one can get the same behavior without retraining the LM.)
Yes, that works. This is how models like CogView can go both ways with the reverse caption trick, akin to projectors in GANs for reversing images to latents (see also Decision Transformer). Since it’s just a sequence of tokens concatenated, the model and training is indifferent to whether you trained it with the text caption tokens first and then image tokens (to generate an image based on text) or image tokens and then text captions (to train a captioner). Apparently it costs very little training to finetune the reverse direction, the model already has most or all of the necessary knowledge (which makes sense) and just needs to rejigger its input handling. It also gives you a way to self-critique image generation: if you train CogView with a reverse captioning version, instead of calling out to CLIP to gauge quality of (text,generated-image) pairs, you can simply run the generated-image’s tokens through the reverse CogView and calculate the likelihood of the text caption token by token & sum—if the total likelihood of the text seems strange based on the generated images, then it wasn’t a good generated image.
No one’s done this with text models that I know of. Probably everyone is too keen on doing complicated prompt-finetuning methods of various kinds to do gradient ascent on the model in various ways to optimize the inputs and weights/biases to bother with trying reversing. I’d predict that trying to infer the necessary prompt with the reversing trick wouldn’t work for small models anyhow, and would be a waste of time compared to directly editing/controlling the model.
Also, even if one had a reversed model available, it would not be trivial to generate useful prompts with it.
The goal is (roughly) to find a prompt that maximizes P(correct_answer | prompt). But the reversed model gives you
P(prompt | correct_answer)=P(correct_answer | prompt) P(prompt)P(correct_answer)The answer is a constant so we can ignore it, but P(prompt) is problematic: we don’t care about the likelihood of the prompt, since we plan to condition on it.
Moreover, we need a way to communicate what the prompt is supposed to mean, and a single answer isn’t a sufficient latent for that. (Consider ”...2002? Barack Obama” → “who was the Illinois State senator for the 13th district in the year...”)
Prompt-finetuning resolves the ambiguity by averaging over multiple answers, which could work here, but would require an unusual sampling technique (average likelihood over multiple prompts?).
My intuition is that it would Just Work for large smart models, which are easy to prompt and few-shot well: like the ‘French->English’ prompt, if you reversed that and you kept getting ‘French->English’ as your prompt across multiple examples, well, that’s obviously a good prompt, whatever the objective likelihood of text starting with ‘French->English’ may be. And to the extent that it didn’t work, it would either be relatively ‘obvious’ what the reversed outputs have in common so you could try to come up with a generic prompt (“who was the X State senator for the nth district in the year MMMM?”), or it would at least be a good starting point for the gradient ascent-type procedures (in the same way that with GAN projectors you often treat it as a hybrid: a quick projection of amortized computation into roughly the right latent z and then gradient ascent to clean it up) and you get a bunch of reversed prompts as seeds for optimization to find the best one or maximize diversity for some other reason.
There is no known way to “reverse” an LM like that.
(Well, besides the brute force method, where you generate a preceding token by looping over all possible values for that token. GPT’s vocab has ~50k tokens, so this is ~50k slower than forwared sampling.)
There are some LMs that naturally work in both directions. Namely, masked language models (eg BERT), as opposed to causal language models (eg GPT). Rather than taking a substring as input, masked language models take a complete string, but with some positions randomly masked or corrupted, and it’s trained to undo these changes.
However, these models are mostly used for things other than text generation; it’s possible to make them write text, but the resulting text tends to be lower-quality than what you can get from a comparably sized GPT.