Yeah, my sense is others (like Anthropic) followed along after OpenAI did that, though it seemed to me mostly to be against consensus in the alignment field (though I agree it’s messy).
Huh, interesting. Maybe the OpenAI statements about their models being “more aligned” came earlier than that in the context of Instruct-GPT? I definitely feel like I remember some Twitter threads and LW comment threads about it in the context of OpenAI announcements, and nothing in the context of Anthropic announcements.
Pretty sure Anthropic’s early assistant stuff used the word this way too: See e.g. Bai et al https://arxiv.org/abs/2204.05862
But yes, people complained about it a lot at the time
Yeah, my sense is others (like Anthropic) followed along after OpenAI did that, though it seemed to me mostly to be against consensus in the alignment field (though I agree it’s messy).
(The Anthropic paper I cited predates ChatGPT by 7 months)
Huh, interesting. Maybe the OpenAI statements about their models being “more aligned” came earlier than that in the context of Instruct-GPT? I definitely feel like I remember some Twitter threads and LW comment threads about it in the context of OpenAI announcements, and nothing in the context of Anthropic announcements.
This is likely not the first instance, but OpenAI was already using the word “aligned” in this way in 2021 in the Codex paper.
https://arxiv.org/abs/2107.03374 (section 7.2)
Ah, you’re correct, it’s from the original instructGPT release in Jan 2022:
https://openai.com/index/instruction-following/