Sorry, the chess example is not a great one. The first two are more what I’m imagining, and the best example is the “never involves violence” one.
To clarify the distinction: My claim is that some conditionals are inaccessible via prompting alone (without any clever sampling/filtering) yet which you can get with fine-tuning.
You can get any conditional you want if you filter the output sufficiently, but then I start to worry about efficiency...
Don’t the results from this paper (showing that prompt tuning matches the performance of fine-tuning for large models) imply that it’s more of a quantitative gap than a qualitative one? It might be difficult to get some conditionals with prompting alone, but that might say more about how good we are right now with prompt design than something fundamental about what can be accessed by prompting without clever sampling—especially when prompt tuning performance rises with scale relative to fine-tuning.
I think the name “prompt tuning” is a little misleading in this context, because the prompts in that setting aren’t actually fixed tokens from the model’s vocabulary and we can’t interpret them as saying “this token in the text string is fixed”. In particular, prompt tuning seems much closer to model fine-tuning in terms of what’s actually happening (gradient descent on the embeddings of some auxiliary tokens).
Yeah, but they’re still operating on the same channel. I guess what I don’t get is how removing the limitation that they have to be tokens semantically sensible to us would allow it to access conditionals that were qualitatively beyond reach before.
Sorry, the chess example is not a great one. The first two are more what I’m imagining, and the best example is the “never involves violence” one.
To clarify the distinction: My claim is that some conditionals are inaccessible via prompting alone (without any clever sampling/filtering) yet which you can get with fine-tuning.
You can get any conditional you want if you filter the output sufficiently, but then I start to worry about efficiency...
Don’t the results from this paper (showing that prompt tuning matches the performance of fine-tuning for large models) imply that it’s more of a quantitative gap than a qualitative one? It might be difficult to get some conditionals with prompting alone, but that might say more about how good we are right now with prompt design than something fundamental about what can be accessed by prompting without clever sampling—especially when prompt tuning performance rises with scale relative to fine-tuning.
I think the name “prompt tuning” is a little misleading in this context, because the prompts in that setting aren’t actually fixed tokens from the model’s vocabulary and we can’t interpret them as saying “this token in the text string is fixed”. In particular, prompt tuning seems much closer to model fine-tuning in terms of what’s actually happening (gradient descent on the embeddings of some auxiliary tokens).
Yeah, but they’re still operating on the same channel. I guess what I don’t get is how removing the limitation that they have to be tokens semantically sensible to us would allow it to access conditionals that were qualitatively beyond reach before.