Olli Järviniemi comments on Olli Järviniemi’s Shortform

Olli Järviniemi 10 Mar 2024 1:50 UTC
4 points
1
In praise of prompting
(Content: I say obvious beginner things about how prompting LLMs is really flexible, correcting my previous preconceptions.)
I’ve been doing some of my first ML experiments in the last couple of months. Here I outline the thing that surprised me the most:
Prompting is both lightweight and really powerful.
As context, my preconception of ML research was something like “to do an ML experiment you need to train a large neural network from scratch, doing hyperparameter optimization, maybe doing some RL ~~and getting frustrated when things just break for no reason~~”.^[1]
And, uh, no.^[2] I now think my preconception was really wrong. Let me say some things that me-three-months-ago would have benefited from hearing.
When I say that prompting is “lightweight”, I mean that both in absolute terms (you can just type text in a text field) and relative terms (compare to e.g. RL).^[3] And sure, I have done plenty of coding recently, but the coding has been basically just to streamline prompting (automatically sampling through API, moving data from one place to another, handling large prompts etc.) rather than ML-specific programming. This isn’t hard, just basic Python.
When I say that prompting is “really powerful”, I mean a couple of things.
First, “prompting” basically just means “we don’t modify the model’s weights”, which, stated like that, actually covers quite a bit of surface area. Concrete things one can do: few-shot examples, best-of-N, look at trajectories as you have multiple turns in the prompt, construct simulation settings and seeing how the model behaves, etc.
Second, suitable prompting actually lets you get effects quite close to supervised fine-tuning or reinforcement learning(!) Let me explain:
Imagine that I want to train my LLM to be very good at, say, collecting gold coins in mazes. So I create some data. And then what?
My cached thought was “do reinforcement learning”. But actually, you can just do “poor man’s RL”: you sample the model a few times, take the action that led to the most gold coins, supervised fine-tune the model on that and repeat.
So, just do poor man’s RL? Actually, you can just do “very poor man’s RL”, i.e. prompting: instead of doing supervised fine-tuning on the data, you simply use the data as few-shot examples in your prompt.
My understanding is that many forms of RL are quite close to poor man’s RL (the resulting weight updates are kind of similar), and poor man’s RL is quite similar to prompting (intuition being that both condition the model on the provided examples).^[4]
As a result, prompting suffices way more often than I’ve previously thought.
1. ^
  This negative preconception probably biased me towards inaction.
2. ^
  Obviously some people do the things I described, I just strongly object to the “need” part.
3. ^
  Let me also flag that supervised fine-tuning is much easier than I initially thought: you literally just upload the training data file at https://platform.openai.com/finetune
4. ^
  I admit that I’m not very confident on the claims of this paragraph. This is what I’ve gotten from Evan Hubinger, who seems more confident on these, and I’m partly just deferring here.