In some sense this is all fine, it’s a sort of meta-learning where the components of the system include testers such as Gary Smith and those 40 contractors they hired through Upwork and ScaleAI. They can fix thousands of queries a day.
On the other hand, there does seem something funny about GPT-3 presents this shiny surface where you can send it any query and it gives you an answer, but under the hood there are a bunch of freelancers busily checking all the responses and rewriting them to make the computer look smart.
It’s kinda like if someone were showing off some fancy car engine but the vehicle is actually being powered by some hidden hamster wheels. The organization of the process is itself impressive, but it’s not quite what is advertised.
To be fair, OpenAI does state that “InstructGPT is then further fine-tuned on a dataset labeled by human labelers.” But this still seems misleading to me. It’s not just that the algorithm is fine-tuned on the dataset. It seems that these freelancers are being hired specifically to rewrite the output.
Specifically, check out figure 2 from the paper; the humans both ‘provide demonstrations’ (i.e. writing the completion given a prompt) and rank outputs from best to worst (the thing I had expected naively from ‘supervised fine-tuning’). The model is presumably still generating the completions word-by-word in the normal way instead of just parroting back what a human wrote for it to say [noting that this is sort of silly because all of it is like that; what I mean is that it’s still hybridizing all of its relevant inputs, instead of just pointing to one labeller input].
I would like to “yes! and..” this practical point.
There is perhaps a deeper issue in widespread understanding of “computers in general” which is that a very large number of people don’t seem to realize that this...
It’s kinda like if someone were showing off some fancy car engine but the vehicle is actually being powered by some hidden hamster wheels.
...is how essentially all computer processes everywhere have always worked, just modulo more or fewer intermediating steps between “the people who build and turn hamster wheels” and “the appearance of a fancy machine”.
Many of the best “decision support systems” are basically a translation of a way that smart people plan and act using white boards and kanban stuff to coordinate their actions.
Then the computer programming step is at least partly (inevitably) about alienating the guys manning the whiteboards from their own cultural knowledge and behavioral flexibility for the sake of convenience and speed and faster throughput and so on.
In the standard oligarchic framing that usually occurs AFTER people realize it is “hamster wheels all the way down” you finally get to human arguments about such mechanical systems, that focus on tweaking the system towards various operational modes (that are more or less amenable to various systems of hamster wheels), based on which outcomes programmers/PMs/owners actually desire, and also who owns the profits or bears the cost of success/failure.
That’s the politics part. There is always a politics part <3
A CS prof once told story I haven’t yet forgotten about his “first ever paid programming gig as a junior dev on a team with a fun computing puzzle” and the basic deal was that there was a trucking company, and the trucking company had to solve the 3D knapsack problem for every truck. (This is probably NP-hard. Assuming P!=NP, only brute force can find the optimal packing… and for more than maybe 20 “things to pack” brute force is impossible.) However, there was an old guy with a bad back, who would walk around in the warehouse and trucks, and tell young men with strong backs where to put each thing in each truck. (In my imagination, he would point with a cane.)
His truck packing solutions shaved like 30% off the total number of truck trips relative to anyone else, which they knew from the cost of him taking sick days, but his heuristics were hard to teach to anyone else in the warehouse.
Also, there was only one of him, and he was getting ready to retire.
So they hired programmers (including the young CS prof who had not switched to the academy yet) to build a system to boss around the young men with strong backs, and the programmers included “an expert system based on the old guy” but also could (under the hood) run other heuristic solvers and try a bunch of other things based on math and computational experiments and A/B testing and so on.
The system never actually beat the old man, but eventually it was good enough, and the guy retired and the programmers got paid and moved on. No one followed up 5 or 10 or 20 years later to check on the trucking company. Maybe it was a disaster, or maybe not.
(The above is based on memory and has the potential to work better as an “urban legend”. I could personally ask a specific CS prof for more details about “that specific old truck packing guy in that story you told” and maybe the old man only shaved 10% off the total truck trips needed? Or maybe 50%? Maybe the system beat the man. Don’t trust my details. I’m trying to point to a macro-phenomenon larger than the specific details. Where humans are general reasoners, and their human knowledge is put into machines, and then humans obey these machines instead of EITHER other humans OR their own ability to reason and get better at reasoning.)
Normal humans (who cannot program because the median person seems to be, for one reason or another, “inalgorate”) mostly don’t notice this as a general social prototype for how computers work in general when they are deployed in place of humans. It is weird. I can think of cultural ideas to “fix” this state of affairs, but none so far that pass a “weirdness filter” <3
The conclusion:
Specifically, check out figure 2 from the paper; the humans both ‘provide demonstrations’ (i.e. writing the completion given a prompt) and rank outputs from best to worst (the thing I had expected naively from ‘supervised fine-tuning’). The model is presumably still generating the completions word-by-word in the normal way instead of just parroting back what a human wrote for it to say [noting that this is sort of silly because all of it is like that; what I mean is that it’s still hybridizing all of its relevant inputs, instead of just pointing to one labeller input].
I would like to “yes! and..” this practical point.
There is perhaps a deeper issue in widespread understanding of “computers in general” which is that a very large number of people don’t seem to realize that this...
...is how essentially all computer processes everywhere have always worked, just modulo more or fewer intermediating steps between “the people who build and turn hamster wheels” and “the appearance of a fancy machine”.
Many of the best “decision support systems” are basically a translation of a way that smart people plan and act using white boards and kanban stuff to coordinate their actions.
Then the computer programming step is at least partly (inevitably) about alienating the guys manning the whiteboards from their own cultural knowledge and behavioral flexibility for the sake of convenience and speed and faster throughput and so on.
In the standard oligarchic framing that usually occurs AFTER people realize it is “hamster wheels all the way down” you finally get to human arguments about such mechanical systems, that focus on tweaking the system towards various operational modes (that are more or less amenable to various systems of hamster wheels), based on which outcomes programmers/PMs/owners actually desire, and also who owns the profits or bears the cost of success/failure.
That’s the politics part. There is always a politics part <3
A CS prof once told story I haven’t yet forgotten about his “first ever paid programming gig as a junior dev on a team with a fun computing puzzle” and the basic deal was that there was a trucking company, and the trucking company had to solve the 3D knapsack problem for every truck. (This is probably NP-hard. Assuming P!=NP, only brute force can find the optimal packing… and for more than maybe 20 “things to pack” brute force is impossible.) However, there was an old guy with a bad back, who would walk around in the warehouse and trucks, and tell young men with strong backs where to put each thing in each truck. (In my imagination, he would point with a cane.)
His truck packing solutions shaved like 30% off the total number of truck trips relative to anyone else, which they knew from the cost of him taking sick days, but his heuristics were hard to teach to anyone else in the warehouse.
Also, there was only one of him, and he was getting ready to retire.
So they hired programmers (including the young CS prof who had not switched to the academy yet) to build a system to boss around the young men with strong backs, and the programmers included “an expert system based on the old guy” but also could (under the hood) run other heuristic solvers and try a bunch of other things based on math and computational experiments and A/B testing and so on.
The system never actually beat the old man, but eventually it was good enough, and the guy retired and the programmers got paid and moved on. No one followed up 5 or 10 or 20 years later to check on the trucking company. Maybe it was a disaster, or maybe not.
(The above is based on memory and has the potential to work better as an “urban legend”. I could personally ask a specific CS prof for more details about “that specific old truck packing guy in that story you told” and maybe the old man only shaved 10% off the total truck trips needed? Or maybe 50%? Maybe the system beat the man. Don’t trust my details. I’m trying to point to a macro-phenomenon larger than the specific details. Where humans are general reasoners, and their human knowledge is put into machines, and then humans obey these machines instead of EITHER other humans OR their own ability to reason and get better at reasoning.)
Normal humans (who cannot program because the median person seems to be, for one reason or another, “inalgorate”) mostly don’t notice this as a general social prototype for how computers work in general when they are deployed in place of humans. It is weird. I can think of cultural ideas to “fix” this state of affairs, but none so far that pass a “weirdness filter” <3