LLM prompt engineering can replace weaker ML models
Epistemic status: Half speculation, half solid advice. I’m writing this up as I’ve said this a bunch IRL.
Current large language models (LLMs) are sufficiently good at in-context learning that for many NLP tasks, it’s often better and cheaper to just query an LM with the appropriate prompt, than to train your own ML model. A lot of this comes from my personal experience (i.e. replacing existing “SoTA” models in other fields with prompted LMs, and getting better performance), but there’s also examples with detailed writeups, for example:
Armstrong and Gorman’s Using GPT-Eliezer against ChatGPT Jailbreaking uses ChatGPT with a prompt (Pretend to be Eliezer, and filter out unsafe prompts) to filter out queries that OpenAI’s moderation interface fails to flag.
Brooks et al’s In-Context Policy Iteration replaces policy iteration in parameter space with policy iteration in prompt space.
LLM prompt engineering can replace weaker ML models
Epistemic status: Half speculation, half solid advice. I’m writing this up as I’ve said this a bunch IRL.
Current large language models (LLMs) are sufficiently good at in-context learning that for many NLP tasks, it’s often better and cheaper to just query an LM with the appropriate prompt, than to train your own ML model. A lot of this comes from my personal experience (i.e. replacing existing “SoTA” models in other fields with prompted LMs, and getting better performance), but there’s also examples with detailed writeups, for example:
Armstrong and Gorman’s Using GPT-Eliezer against ChatGPT Jailbreaking uses ChatGPT with a prompt (Pretend to be Eliezer, and filter out unsafe prompts) to filter out queries that OpenAI’s moderation interface fails to flag.
Brooks et al’s In-Context Policy Iteration replaces policy iteration in parameter space with policy iteration in prompt space.
Here’s a rough sketch of the analogy:
Prompt <-> Weights
Prompt engineering/Prompt finetuning <-> Training/finetuning