Essentially, we have to make sure that humans give feedback that matches their preferences...
Humans’ stated preferences do not match their preferences-in-hindsight, neither of those matches humans’ self-reported happiness/satisfaction in-the-moment, none of that matches humans’ revealed preferences, and all of those are time-inconsistent. IIRC the first section of Kahnemann’s textbook Well Being: The Foundations of Hedonic Psychology is devoted entirely to the problem of getting feedback from humans on what they actually like, and the tldr is “people have been working on this for decades and all our current proxies have known problems” (not to say they don’t have unknown problems too, but they definitely have known problems). Once we get past the basic proxies, we pretty quickly run into fundamental conceptual issues about what we even mean by “human preferences”.
Minor rant about this is particular:
Humans’ stated preferences do not match their preferences-in-hindsight, neither of those matches humans’ self-reported happiness/satisfaction in-the-moment, none of that matches humans’ revealed preferences, and all of those are time-inconsistent. IIRC the first section of Kahnemann’s textbook Well Being: The Foundations of Hedonic Psychology is devoted entirely to the problem of getting feedback from humans on what they actually like, and the tldr is “people have been working on this for decades and all our current proxies have known problems” (not to say they don’t have unknown problems too, but they definitely have known problems). Once we get past the basic proxies, we pretty quickly run into fundamental conceptual issues about what we even mean by “human preferences”.