It’s directly about inverse reinforcement learning, but that should be strictly stronger than RLHF. Seems incumbent on those who disagree to explain why throwing away information here would be enough of a normative assumption (contrary to every story about wishes.)
https://arxiv.org/abs/1712.05812
It’s directly about inverse reinforcement learning, but that should be strictly stronger than RLHF. Seems incumbent on those who disagree to explain why throwing away information here would be enough of a normative assumption (contrary to every story about wishes.)