Gordon Seidoh Worley comments on Formal Philosophy and Alignment Possible Projects

Gordon Seidoh Worley 1 Jul 2022 0:51 UTC
LW: 2 AF: 1
0
AF
Re Project 4, you might find my semi-abandoned (mostly because I wasn’t and still am not in a position to make further progress on it) research agenda for deconfusing human values useful.
- Jan 2 Jul 2022 13:17 UTC
  LW: 3 AF: 2
  0
  AF Parent
  This work by Michael Aird and Justin Shovelain might also be relevant: “Using vector fields to visualise preferences and make them consistent”
  And I have a post where I demonstrate that reward modeling can extract utility functions from non-transitive preference orderings: “Inferring utility functions from locally non-transitive preferences”
  (Extremely cool project ideas btw)