It appears that in the last few years the AI Alignment community has dedicated great attention to the Value Learning Problem [1]. In particular, the work of Stuart Armstrong stands out to me.
Concurrently, during the last decade, researcher such as Eyke Hüllermeier Johannes Fürnkranz produced a significant amount of work on the topics of preference learning [2] and preference-based reinforcement learning [3].
While I am not highly familiar with the Value Learning literature, I consider the two fields closely related if not overlapping, but I have not often seen references the Preference Learning work, and vice-versa.
Is this because the two fields are less related than what I think? And more specifically, how do the two fields relate with each other?
References
[1] - Soares, Nate. “The value learning problem.” Machine Intelligence Research Institute, Berkley (2015).
[Question] What is the relationship between Preference Learning and Value Learning?
It appears that in the last few years the AI Alignment community has dedicated great attention to the Value Learning Problem [1]. In particular, the work of Stuart Armstrong stands out to me.
Concurrently, during the last decade, researcher such as Eyke Hüllermeier Johannes Fürnkranz produced a significant amount of work on the topics of preference learning [2] and preference-based reinforcement learning [3].
While I am not highly familiar with the Value Learning literature, I consider the two fields closely related if not overlapping, but I have not often seen references the Preference Learning work, and vice-versa.
Is this because the two fields are less related than what I think? And more specifically, how do the two fields relate with each other?
References
[1] - Soares, Nate. “The value learning problem.” Machine Intelligence Research Institute, Berkley (2015).
[2] - Fürnkranz, Johannes, and Eyke Hüllermeier. Preference learning. Springer US, 2010.
[3] - Fürnkranz, Johannes, et al. “Preference-based reinforcement learning: a formal framework and a policy iteration algorithm.” Machine learning 89.1-2 (2012): 123-156.