Are you talking about prompting here in the traditional sense it’s used with eg: GPT-3? Because if so,
The downside with prompting is that there are lots of conditionals we can’t turn into prompts. For instance:
Sample text from the model that humans will rate as having positive sentiment.
Sample text from the model that never involves violence.
Sample text from the model that contains a valid chess game.
None of these can be expressed in terms of fixed tokens in the context window.
is a confusing assumption to me. Is it meant to be quantitative, like it’s hard to express this within a finite context window? Because it certainly seems possible to prompt to sample those sequences fairly well (“Below is a sequence from a chess game that begins with 1. e4”). Your phrasing makes it sound like a qualitative difference, which makes me think I’m misunderstanding either your definition of prompting (possibly in the strictness of the sequence sampling?) or missing something super obvious.
Sorry, the chess example is not a great one. The first two are more what I’m imagining, and the best example is the “never involves violence” one.
To clarify the distinction: My claim is that some conditionals are inaccessible via prompting alone (without any clever sampling/filtering) yet which you can get with fine-tuning.
You can get any conditional you want if you filter the output sufficiently, but then I start to worry about efficiency...
Don’t the results from this paper (showing that prompt tuning matches the performance of fine-tuning for large models) imply that it’s more of a quantitative gap than a qualitative one? It might be difficult to get some conditionals with prompting alone, but that might say more about how good we are right now with prompt design than something fundamental about what can be accessed by prompting without clever sampling—especially when prompt tuning performance rises with scale relative to fine-tuning.
I think the name “prompt tuning” is a little misleading in this context, because the prompts in that setting aren’t actually fixed tokens from the model’s vocabulary and we can’t interpret them as saying “this token in the text string is fixed”. In particular, prompt tuning seems much closer to model fine-tuning in terms of what’s actually happening (gradient descent on the embeddings of some auxiliary tokens).
Yeah, but they’re still operating on the same channel. I guess what I don’t get is how removing the limitation that they have to be tokens semantically sensible to us would allow it to access conditionals that were qualitatively beyond reach before.
Are you talking about prompting here in the traditional sense it’s used with eg: GPT-3? Because if so,
is a confusing assumption to me. Is it meant to be quantitative, like it’s hard to express this within a finite context window? Because it certainly seems possible to prompt to sample those sequences fairly well (“Below is a sequence from a chess game that begins with 1. e4”). Your phrasing makes it sound like a qualitative difference, which makes me think I’m misunderstanding either your definition of prompting (possibly in the strictness of the sequence sampling?) or missing something super obvious.
Sorry, the chess example is not a great one. The first two are more what I’m imagining, and the best example is the “never involves violence” one.
To clarify the distinction: My claim is that some conditionals are inaccessible via prompting alone (without any clever sampling/filtering) yet which you can get with fine-tuning.
You can get any conditional you want if you filter the output sufficiently, but then I start to worry about efficiency...
Don’t the results from this paper (showing that prompt tuning matches the performance of fine-tuning for large models) imply that it’s more of a quantitative gap than a qualitative one? It might be difficult to get some conditionals with prompting alone, but that might say more about how good we are right now with prompt design than something fundamental about what can be accessed by prompting without clever sampling—especially when prompt tuning performance rises with scale relative to fine-tuning.
I think the name “prompt tuning” is a little misleading in this context, because the prompts in that setting aren’t actually fixed tokens from the model’s vocabulary and we can’t interpret them as saying “this token in the text string is fixed”. In particular, prompt tuning seems much closer to model fine-tuning in terms of what’s actually happening (gradient descent on the embeddings of some auxiliary tokens).
Yeah, but they’re still operating on the same channel. I guess what I don’t get is how removing the limitation that they have to be tokens semantically sensible to us would allow it to access conditionals that were qualitatively beyond reach before.