[Question] Does anyone’s full-time job include reading and understanding all the most-promising formal AI alignment work?

Nicholas / Heather Kross16 Jun 2023 2:24 UTC

15 points

AI Alignment Fieldbuilding Research Agendas AI

(By “most promising” I mostly mean “not obviously making noob mistakes”, with the central examples being “any Proper Noun research agenda associated with a specific person or org”.)

(By “formal” I mean “involving at least some math proofs, and not solely coding things”.)

Asking because the field is both relatively-small and also I’m not sure if any single person “gets” all of it anymore.

Example that made me ask this (not necessarily a central example): Nate Soares wrote this about John Wentworth’s work, but then Wentworth replied saying it was inaccurate about his current/overall priorities.

Nicholas / Heather Kross16 Jun 2023 2:24 UTC

15 points

2 comments1 min readLW link

AI Alignment Fieldbuilding Research Agendas AI

No answers.

Seth Herd 16 Jun 2023 6:38 UTC
−2 points
−1
I’m just curious, why the specification of math proofs? I know of some modestly promising ideas for aligning the sorts of AGI we’re likely to get, and none of them were originally specified in mathematical terms. Tacking on maths to those wouldn’t really be useful. My impression is that the search for formal proofs of safety have failed and are probably hopeless. It also seems like adding mathematical gloss to ML and psychological concepts is more often confusing than enlightening.
- Nicholas / Heather Kross 16 Jun 2023 12:42 UTC
  3 points
  0
  Parent
  It’s to differentiate from ~~more-obviously-tractable~~ less-formalism/conceptual/deconfusion-based research agendas like i.e. HCH. As asked, I’m looking for info specifically related to this other kind.