TL;DR: Thought this post was grossly misleading. Then I saw that the GPT3 playground/API changed quite a lot recently in notable and perhaps worrying ways. This post is closer to the truth than I thought but I still consider it misleading.
Initially strongly downvoted since the LW post implies (to me) that humans provide some of the GPT3 completions in order to fool users into thinking it’s smarter than it is. Was that interpretation of your post more in the eye of the beholder?
Nested three layers deep is one of two pieces of actual evidence:
InstructGPT is then further fine-tuned on a dataset labeled by human labelers. The labelers comprise a team of about 40 contractors whom we hired through Upwork and ScaleAI.
My impression was that InstructGPT was a new/separate model, available as an option in the API along with the base GPT3, that is openly finetuned with human feedback as a way of aligning the base model. That was the whole point of this paper: https://arxiv.org/abs/2203.02155
This is very different from what I saw this post as implying, because OpenAI are open about it, it’s different from the main GPT3 and it’s not humans providing completions but humans aligning a language model. Hence strong downvote.
(The examples about the completions for select “gotcha” prompts improving over time aren’t very compelling evidence for what this post implies. The ones changing in a day are pretty compelling, though—how weird!)
Then I opened the GPT3 API playground for the first time in a few months and realized that my understanding was outdated. Looks like InstructGPT and the old non-finetuned davinci have been merged into text-davinci-002, which is now the default model.
Trying the “Does grape juice taste better if you add sour milk?” prompt many times over keeps giving me the exact same answer that Gary got, even with max temperature. To test where on the spectrum between “aligned using human review” and “repeating human answers” this lies, I tried some variations:
Does grape juice taste better if you add sour cream?
No, grape juice does not taste better if you add sour cream.
Does wine taste better if you add sour cream?
There is no definitive answer to this question as it depends on personal preferences. Some people might find that wine tastes better when paired with sour cream, while others may not notice a difference. Ultimately, it is up to the individual to decide whether or not they think wine tastes better with sour cream.
Should I add sour milk to grape juice in order to improve taste?
It is not necessary to add sour milk to grape juice in order to improve its taste. Grape juice is already a tart and flavorful beverage. However, if you enjoy the taste of sour milk, you may add it to grape juice to create a tart and refreshing drink.
While GPT3 might not literally outsource a portion of the requests to MTurk, I don’t think it’s unfair to say that some of the completions are straight-up human provided. If corrected completion was added in a way that generalized (e.g. aligning using human feedback like in the paper), then it would have been a different story. But it clearly doesn’t.
So to recap:
the curation of InstructGPT is now in the default model
human completions are substituted within a day in response to publicized embarrassing completions (I’m alleging this)
human completions aren’t added such that the model is aligned to give more helpful answers, because very similar prompts still give bad completions
In addition, and more intangibly, I’m noticing that GPT3 is not the model I used to know. The completions vary a lot less between runs. More strikingly, they have this distinct tone. It reads like a NYT expert fact checker or first page Google results for a medical query.
I tried one of my old saved prompts for a specific kind of fiction prompt and the completion was very dry and boring. The old models are still available and it works better there. But I won’t speculate further since I don’t have enough experience with the new (or the old) GPT3.
TL;DR: Thought this post was grossly misleading. Then I saw that the GPT3 playground/API changed quite a lot recently in notable and perhaps worrying ways. This post is closer to the truth than I thought but I still consider it misleading.
Initially strongly downvoted since the LW post implies (to me) that humans provide some of the GPT3 completions in order to fool users into thinking it’s smarter than it is. Was that interpretation of your post more in the eye of the beholder?
Nested three layers deep is one of two pieces of actual evidence:
My impression was that InstructGPT was a new/separate model, available as an option in the API along with the base GPT3, that is openly finetuned with human feedback as a way of aligning the base model. That was the whole point of this paper: https://arxiv.org/abs/2203.02155
This is very different from what I saw this post as implying, because OpenAI are open about it, it’s different from the main GPT3 and it’s not humans providing completions but humans aligning a language model. Hence strong downvote.
(The examples about the completions for select “gotcha” prompts improving over time aren’t very compelling evidence for what this post implies. The ones changing in a day are pretty compelling, though—how weird!)
Then I opened the GPT3 API playground for the first time in a few months and realized that my understanding was outdated. Looks like InstructGPT and the old non-finetuned davinci have been merged into
text-davinci-002
, which is now the default model.Trying the “Does grape juice taste better if you add sour milk?” prompt many times over keeps giving me the exact same answer that Gary got, even with max temperature. To test where on the spectrum between “aligned using human review” and “repeating human answers” this lies, I tried some variations:
While GPT3 might not literally outsource a portion of the requests to MTurk, I don’t think it’s unfair to say that some of the completions are straight-up human provided. If corrected completion was added in a way that generalized (e.g. aligning using human feedback like in the paper), then it would have been a different story. But it clearly doesn’t.
So to recap:
the curation of InstructGPT is now in the default model
human completions are substituted within a day in response to publicized embarrassing completions (I’m alleging this)
human completions aren’t added such that the model is aligned to give more helpful answers, because very similar prompts still give bad completions
In addition, and more intangibly, I’m noticing that GPT3 is not the model I used to know. The completions vary a lot less between runs. More strikingly, they have this distinct tone. It reads like a NYT expert fact checker or first page Google results for a medical query.
I tried one of my old saved prompts for a specific kind of fiction prompt and the completion was very dry and boring. The old models are still available and it works better there. But I won’t speculate further since I don’t have enough experience with the new (or the old) GPT3.