LawrenceC comments on Defining alignment research

LawrenceC 19 Aug 2024 21:19 UTC
LW: 5 AF: 4
0
AF
Pretty sure Anthropic’s early assistant stuff used the word this way too: See e.g. Bai et al https://arxiv.org/abs/2204.05862
But yes, people complained about it a lot at the time
- habryka 19 Aug 2024 21:28 UTC
  LW: 4 AF: 4
  0
  AF Parent
  Yeah, my sense is others (like Anthropic) followed along after OpenAI did that, though it seemed to me mostly to be against consensus in the alignment field (though I agree it’s messy).
  - LawrenceC 19 Aug 2024 22:42 UTC
    LW: 3 AF: 3
    0
    AF Parent
    (The Anthropic paper I cited predates ChatGPT by 7 months)
    - habryka 19 Aug 2024 23:00 UTC
      LW: 3 AF: 2
      1
      AF Parent
      Huh, interesting. Maybe the OpenAI statements about their models being “more aligned” came earlier than that in the context of Instruct-GPT? I definitely feel like I remember some Twitter threads and LW comment threads about it in the context of OpenAI announcements, and nothing in the context of Anthropic announcements.
      - leogao 20 Aug 2024 2:10 UTC
        LW: 10 AF: 6
        2
        AF Parent
        This is likely not the first instance, but OpenAI was already using the word “aligned” in this way in 2021 in the Codex paper.
        
        https://arxiv.org/abs/2107.03374 (section 7.2)
      - LawrenceC 20 Aug 2024 20:20 UTC
        LW: 4 AF: 4
        0
        AF Parent
        Ah, you’re correct, it’s from the original instructGPT release in Jan 2022:
        https://openai.com/index/instruction-following/