Richard_Ngo comments on Defining alignment research

Richard_Ngo 25 Aug 2024 15:43 UTC
LW: 3 AF: 3
0
AF
Whose work is relevant, according to you?
- habryka 25 Aug 2024 17:21 UTC
  LW: 4 AF: 3
  0
  AF Parent
  Lots of people’s work:
  - Paul’s work (ELK more than RLHF though it was useful to see what happens when you throw RL at LLMs in a way that’s kind of similar to how I do get some value out of Chris’s work)
  - Eliezer’s work
  - Nate’s work
  - Holden’s writing on cold takes
  - Ajeya’s work
  - Wentworth’s work
  - The debate stuff
  - Redwood’s work
  - Bostrom’s work
  - Evan’s work
  - Scott and Abram’s work
  There is of course still huge variance in how relevant and how much for the throat these different people’s work is going for, but all of these seem more relevant to AI Alignment/AI-not-kill-everyonism than Chris’s work (which again, I found interesting, but not like super interesting).
  - Evan_Gaensbauer 24 Sep 2024 8:13 UTC
    2 points
    0
    Parent
    Do you mean Evan Hubinger, Evan R. Murphy, or a different Evan? (I would be surprised and humbled if it was me, though my priors on that are low.)
    - habryka 24 Sep 2024 14:16 UTC
      8 points
      0
      Parent
      Hubinger
- sunwillrise 25 Aug 2024 16:39 UTC
  4 points
  −2
  Parent
  Definitely not trying to put words in Habryka’s mouth, but I did want to make a concrete prediction to test my understanding of his position; I expect he will say that:
  - the only work which is relevant is the one that tries to directly tackle what Nate Soares described as “the hard bits of the alignment challenge” (the identity of which Habryka basically agrees with Soares about)
  - nobody is fully on the ball yet
  - but agent foundations-like research by MIRI-aligned or formerly MIRI-aligned people (Vanessa Kosoy, Abram Demski, etc.) is the one that’s most relevant, in theory
  - however, in practice, even that is kinda irrelevant because timelines are short and that work is going along too slowly to be useful even for deconfusion purposes
  Edit: I was wrong.