(By “most promising” I mostly mean “not obviously making noob mistakes”, with the central examples being “any Proper Noun research agenda associated with a specific person or org”.)
(By “formal” I mean “involving at least some math proofs, and not solely coding things”.)
Asking because the field is both relatively-small and also I’m not sure if any single person “gets” all of it anymore.
Example that made me ask this (not necessarily a central example): Nate Soares wrote this about John Wentworth’s work, but then Wentworth replied saying it was inaccurate about his current/overall priorities.
I’m just curious, why the specification of math proofs? I know of some modestly promising ideas for aligning the sorts of AGI we’re likely to get, and none of them were originally specified in mathematical terms. Tacking on maths to those wouldn’t really be useful. My impression is that the search for formal proofs of safety have failed and are probably hopeless. It also seems like adding mathematical gloss to ML and psychological concepts is more often confusing than enlightening.
It’s to differentiate from more-obviously-tractable less-formalism/conceptual/deconfusion-based research agendas like i.e. HCH. As asked, I’m looking for info specifically related to this other kind.
[Question] Does anyone’s full-time job include reading and understanding all the most-promising formal AI alignment work?
(By “most promising” I mostly mean “not obviously making noob mistakes”, with the central examples being “any Proper Noun research agenda associated with a specific person or org”.)
(By “formal” I mean “involving at least some math proofs, and not solely coding things”.)
Asking because the field is both relatively-small and also I’m not sure if any single person “gets” all of it anymore.
Example that made me ask this (not necessarily a central example): Nate Soares wrote this about John Wentworth’s work, but then Wentworth replied saying it was inaccurate about his current/overall priorities.
I’m just curious, why the specification of math proofs? I know of some modestly promising ideas for aligning the sorts of AGI we’re likely to get, and none of them were originally specified in mathematical terms. Tacking on maths to those wouldn’t really be useful. My impression is that the search for formal proofs of safety have failed and are probably hopeless. It also seems like adding mathematical gloss to ML and psychological concepts is more often confusing than enlightening.
It’s to differentiate from
more-obviously-tractableless-formalism/conceptual/deconfusion-based research agendas like i.e. HCH. As asked, I’m looking for info specifically related to this other kind.