My intuition is that it would Just Work for large smart models, which are easy to prompt and few-shot well: like the ‘French->English’ prompt, if you reversed that and you kept getting ‘French->English’ as your prompt across multiple examples, well, that’s obviously a good prompt, whatever the objective likelihood of text starting with ‘French->English’ may be. And to the extent that it didn’t work, it would either be relatively ‘obvious’ what the reversed outputs have in common so you could try to come up with a generic prompt (“who was the X State senator for the nth district in the year MMMM?”), or it would at least be a good starting point for the gradient ascent-type procedures (in the same way that with GAN projectors you often treat it as a hybrid: a quick projection of amortized computation into roughly the right latent z and then gradient ascent to clean it up) and you get a bunch of reversed prompts as seeds for optimization to find the best one or maximize diversity for some other reason.
My intuition is that it would Just Work for large smart models, which are easy to prompt and few-shot well: like the ‘French->English’ prompt, if you reversed that and you kept getting ‘French->English’ as your prompt across multiple examples, well, that’s obviously a good prompt, whatever the objective likelihood of text starting with ‘French->English’ may be. And to the extent that it didn’t work, it would either be relatively ‘obvious’ what the reversed outputs have in common so you could try to come up with a generic prompt (“who was the X State senator for the nth district in the year MMMM?”), or it would at least be a good starting point for the gradient ascent-type procedures (in the same way that with GAN projectors you often treat it as a hybrid: a quick projection of amortized computation into roughly the right latent z and then gradient ascent to clean it up) and you get a bunch of reversed prompts as seeds for optimization to find the best one or maximize diversity for some other reason.