What if human preferences aren’t representable by a utility function
I’m responding to this specifically, rather than the question of RLHF and ‘human irrationality’.
I’m not saying this is the case, but what if ‘human preferences’ are representable by something more complicated. Perhaps an array or vector? Can it learn something like that?
I’m responding to this specifically, rather than the question of RLHF and ‘human irrationality’.
I’m not saying this is the case, but what if ‘human preferences’ are representable by something more complicated. Perhaps an array or vector? Can it learn something like that?